• Open

    [D] How to compute the distance between two high-dimensional distributions?
    Hey all, I am generating a set of extra MNIST digits for a research project, and I am interested in somehow computing the distance between the distribution these digits represent and the distribution that the MNIST train set, for example, represents. The issue is that it seems like typical methods (Jensen-Shannon, Wasserstein, etc.) collapse at high dimensions. Is there a consensus solid approach to do this nowadays? Thanks! submitted by /u/SignificantSundae793 [link] [comments]  ( 9 min )
  • Open

    AGI/Singularity is overhyped.
    Greetings! I would like to begin by stating that I understand why one has much hope in such technologies. The world as we know it is in a drastic shift, and it's hard to think of what it's going to become, and so many cling to hopeful ideas that give promises. AGI/Singularity doesn't have a grounding basis in evidence, or research. It's all theoretics, and the foundation for each technology is quite weak. You see, the mind is a sensorial parsing relational network. All of our sensorial experience is incorporated into a world-model, and thus it begins to rationalize, and be lucid of the environment. I don't think it's possible to re-create this kind of experience with a linear instruction set, let alone neuromorphic computing, or wetware. Each has to be built from the bottom-up with immense precision, and thus far we don't understand the mind. Realistically speaking everything is consciousness, and integrating that idea is the only way forward. tl;dr Replicating cognition is a completely theoretical endeavor, and requires vast amounts of understanding in regards to the nature of reality, not just the quantum, but the unique stochastic behavior of each higher-ordered system. submitted by /u/lucy_chxn [link] [comments]
    AI designs new robot from scratch in seconds
    submitted by /u/liberty4now [link] [comments]

  • Open

    [D] - Synthetic dataset - Searching for honest comparison between LLM (gpt4, bizon, jurassic-2, Claude...)
    I'm looking for resources, papers, or experiences that compare the performance of large language models (LLMs). I'm trying to find a honest benchmark to compare the capabilities of the latest large models, while really intrested un those: GPT-3.5 Instruct, GPT-4, Claude 2, Claude Instant 100k, Palm2-Bizon, jurassic-2, LLama2 70 and other state-of-the-art LLama2 fine tunes (possibly an Orca-style model). I'm interested in general benchmarks and, if they exist, comparisons of performance on synthetic data generation tasks (both generating data with the "textbook are all you need" approach used in Phi and some Orca/EvolveInstuct-style models like Wizard...). submitted by /u/Distinct-Target7503 [link] [comments]
    [P] How to extract and count artist mentions from messy text data using LLMs
    I have a long list of responses from a poll (in this case, we've asked our Facebook community we should have at our music festival). Our goal is to count the total mentions for each artist, but the data quality is low. Here is some sample data: Rena Guinn and the Gentlemen Blackwater Railroad Company Mo' Mojo Music !! We would love to be apart of this awesome event! Amazing!!!!! The Rollin' Rust came threw at the #falldownfest last weekend 🙂 much love:) keep it up boys 🙂 Luke Hess Langhorne Slim!!!!!, Sierra Hull, First Aid Kit, Jim Lauderdale (always) We feel the data quality is too poor for basic LDA approaches (lots of misspellings, odd phrasings) and we feel a LLM would be best at least extracting the names of artists using context. We have found that ChatGPT and Claude are decent at the extraction tasks on small samples but can't handle the full input, and are next to worthless on the counting task. We've tried very specific and differnet prompts, but haven't been able to get a good result. So how should I approach this problem? I'm not sure how to break this down in to prompts or substeps. I'm not sure how to do anything of this outside of a browser, and I'm a data science novice, but willing to learn some things. Here's an example of a prompt that's not returning correct counts (off by >50% in most cases) The following is raw text comments copied from a poll. Count the total number of mentions in the poll and create a table that contains columns Band (a unique list of bands) and a column containing the total number of mentions. The table should cover the top 100 bands by total mentions. Use judgement and context to conform band names in to unique values (Example: The Town Pants, Town Pants, townpants are all the same band). Count completely and accurately. Now here is the raw data: submitted by /u/strway2heaven77 [link] [comments]  ( 10 min )
    [P] Avenues for publishing AI ethics case studies?
    I am a computer science graduate student. As part of my coursework, I am exploring the ethical issues of using Large Language Models for mental healthcare applications. I found four unique examples from the real world and outlined the ethical dilemma within them. I intend to analyze these dilemmas using various ethical frameworks in order to come up with solutions. While I am interested in getting a publication out of this work, I am unsure of the types of conferences/journals that accept case-study articles (specifically in AI ethics). Any advice from academicians over here would be greatly appreciated! submitted by /u/jwalapoet [link] [comments]  ( 9 min )
    [D] [R] Is the noise predictor in DDPMs predicting the noise added to x_0 or the noise added to x_{t-1}?
    Hi fellow computer scientists, ​ After reading the paper Improved Denoising Diffusion Probabilistic Models I got a little confused. Looking at section "2.2. Training in Practice" the authors say that: 1) "The network could also predict the noise eps added to x_0, and this noise could be used to predict x0 via..." ​ 2) "Ho et al. (2020) found that predicting eps worked best..." ​ So this left me wondering if the noise predictor is trying to compute (1) the epsilon that was added to x_0 through the close-form formula or (2) the noise added in the previous timestep to obtain x_t from x_{t-1} (i.e., eps_t or eps_{t-1}, idk...)? ​ Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    [P] MazeGPT - Transformer based maze generator
    Hello all, I recently did a summer research project implementing GPT-2 to generate mazes. The core concept of the model is to combine a bunch of popular maze generation algorithms into one. The goal was that the transformer will be able to identify key components using self-attention and piece together different algorithms. Most maze generation algorithms result in almost a finger print (like in chaos theory). The end goal was to mimic a higher degree of randomness / make the mazes appear less algorithmic. I'm dipping my toes into the realm of research and am looking for feedback. So far I've run the model for 5x5 mazes, it would be interesting to try training the model with varying dimensions. Feel free to join in and contribute to the project! https://github.com/noah-hein/mazeGPT 5x5 live generation https://i.redd.it/v6smbdd88gsb1.gif ​ submitted by /u/noah-hein [link] [comments]  ( 9 min )
    [D] Unable to improve binary classification problem accuracy
    I am currently working on a binary classification problem where I aim to predict whether a customer will make a purchase in the next 30 days based on their transaction history. I have a dataset of 1,000 transactions with the following features: TransactionAmount (float): The amount of the transaction. ProductCategory (categorical): Category of the product purchased (e.g., Groceries, Electronics, Books). DateOfPurchase (datetime): The date on which the transaction occurred. I've done some preprocessing and feature engineering, including normalization, one-hot encoding of categorical variables, creating interaction terms, and adding features like days since the first purchase and whether the purchase was made during the holiday season.Dataset is balanced and cleaned. I started with a base Random Forest classifier with default parameters as a starting point, but the performance is not satisfactory (accuracy = 48.5%, ROC-AUC = 0.485). I tried other models as well but was unable to improve the accuracy by more than 57%. submitted by /u/SnooTigers4634 [link] [comments]
    [D] EMNLP 2023 results
    Making a post for EMNLP 2023 results to come out today. submitted by /u/East-Beginning9987 [link] [comments]  ( 8 min )
    [P] Need help figuring out my input for anomaly detection in frequency responses
    I’ve been given a task to identify if a PCB is faulty or not based on its frequency response. I don’t have labeled data. The data I have are various gain values calculated over frequencies, so my data looks something similar to the table below. PCB | Frequency | G1 | G2 PCB 1 | 1Hz | 0.1 | 1 PCB 1 | 2Hz | 0.2 | 2 PCB2 | 1Hz | 0.3 | 3 PCB2 | 2Hz | 0.4| 4 Each PCB has several G parameters measurements taken over the same set of frequencies. I need to use an auto encoder to identify outliers and I need help in deciding how my feature matrix should look like. For example, let us consider only one data point that is PCB 1, then would a matrix like this make sense? [[ 0.1 0.2 ] - 1st row is all G1 values [1 2]] - 2nd row is all G2 values Similarly the matrix for the other PCBs are also created. I have not included frequency in my feature set because these G parameters have been measured for the same set of frequencies for all PCBs. Is this correct ? Additionally, are there any resources someone can point me to related to finding anomalies in frequency response data ? I am struggling with using the keywords while googling. submitted by /u/Savage_Garbage [link] [comments]
    [R] Towards Monosemanticity: Decomposing Language Models With Dictionary Learning. From Anthropic. "We demonstrate a method for decomposing groups of neurons into interpretable features [...]".
    Paper. I am not affiliated with this paper or its authors. Twitter thread (Nitter alternative for those who want to see the entire thread without being logged into Twitter). Related work: Sparse Autoencoders Find Highly Interpretable Features in Language Models. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [R] Meta researchers present method for decoding speech from brain waves
    Researchers at Meta trained a deep learning model on brain recordings and audio data from 169 people listening to speech. Their method achieves up to 73% accuracy at identifying a 3-second clip of speech from non-invasive EEG or MEG scans. This is a massive improvement over previous attempts at decoding speech from neural signals. It approaches the performance of studies using implanted electrodes. The key innovations: A contrastive loss function that aligns latent speech and brain representations Leveraging pretrained speech models like wav2vec 2.0 Training one model on multiple subjects with individual tuning Being able to decode speech intention from brainwaves could one day help restore communication for patients suffering from strokes, ALS, etc. There's still a ways to go before this becomes a medical reality. Performance needs to improve and be validated during speech production rather than just passive listening. And the accuracy isn't high enough for natural conversations. But this is a hugely promising step toward brain-computer interfaces. Really interesting work at the intersection of neuroscience and AI! TLDR: New model achieves up to 73% accuracy decoding speech directly from non-invasive brain scans. Could eventually help patients with neurological conditions communicate just by thinking. Full summary here. Paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] EMNLP 2023 Notification
    Discussion thread for EMNLP 2023 notifications which will be released in a few hours along with GEM workshop. Best of luck to everyone. submitted by /u/EDEN1998 [link] [comments]  ( 9 min )
    [D] ordinal or nominal variable?
    Hey all, I am working with stock market data and scratching my head if certain variables are ordinal and can be left as is or if it is nominal and should be one-hot encoded. One of the variables in question consists of the direction of the market over a certain time. It has three categories: up, down, sideways. hope was to code them as 1, -1 and 0 respectively and treat as ordinal. There appears to be some order/relationship between them but not sure if it is enough. Is this the correct approach or should it be one-hot encoded? submitted by /u/Fishpo0 [link] [comments]  ( 9 min )
    [D] Deep Learning online course using PyTorch
    I've been out of the deep learning space for a while now and I'd like to take an online course, or set of courses, to get myself back up to speed on the latest techniques, architectures, and how to use them. I think the DeepLearning.ai specialization through Coursera is a good match, but I see that it uses Tensorflow. Is there any course like this that would use PyTorch? Or would the transition not be too hard once the fundamentals are in place? Thanks! submitted by /u/ComicFoil [link] [comments]
    Fine Tuning or RAG for Coding [D]
    Need some help what is the best way to start. Pls Advice ! I have a specific code in my repos (lets say .net + JS). The goal is to have prompt based code adjustments to existing repos (like very focused copilot) . Either using single agent or using something like AutoGen. So let say I have thousands of files with code and some descriptions about code functionality (spec) . I want either to generate code based on next spec and I want newly generated code to be similar in style to what is in my repos. So now questions: Should I vectorize my code (What is best way to do that ?) or try to fine tune some model ? Give me your ideas / experience in code generation based on previous code. submitted by /u/mcwin1 [link] [comments]  ( 9 min )
    [Project] I built an open-source scraping API that returns structured JSON data using GPT.
    I decided to open-source my own web scraping API that I'm using to get information from different websites without using any selectors or XPath. Just provide the URL and a desired JSON schema, and it will return extracted data. Hope this can be helpful for someone. Cheers! https://github.com/semanser/JsonGenius https://preview.redd.it/icq1i8slvesb1.png?width=4096&format=png&auto=webp&s=ac86ccdb3da5ef1ffa86e3473619162f6b652ac6 submitted by /u/semanser [link] [comments]  ( 9 min )
    [R] Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] NIDDK-CR Data Centric Challenge: Enhancing NIDDK datasets for future artificial intelligence applications
    Calling all AI researchers! Using data aggregation, harmonization, fusion, and other data enhancement methods, you can help the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) enhance the utility of NIDDK datasets for AI applications. The goal of the NIDDK Data Centric Challenge will be to generate an “AI-ready” dataset that can be used for future data challenges, using data on Type 1 Diabetes available through the NIDDK Central Repository. Register today! https://www.challenge.gov/?challenge=niddk-central-repository-data-centric-challenge submitted by /u/DataCentricChallenge [link] [comments]
    [D] off-topic, is Meta Llama 2 license agreement safe to sign for commercial use ?
    in the Meta Llama 2 license agreement (that can be found here), there is a section of "Prohibited Uses" that clearly states several use cases that the signer must accept upon himself, but several of them state the word "facilitate", as far as i can understand, if we use Llama 2 as part of a commercial product, and some end-user will use the product in malicious way (say cause the chat-bot to write the recipe of mustard gas) then this could be considered that the creator of the product is facilitating the end-user, ​ so my questions are: do you think this is a fair interpretation of the agreement ? does that mean the creator is liable to whatever the model spit out ? is there a way to censor the model (short of retraining a new model, or fine-tune on a large scale) ? is there an open source model that already gone through the process, and more safe for commercial use ? ​ https://preview.redd.it/3zo3tm4e8esb1.png?width=1197&format=png&auto=webp&s=8aa522183f82ba8f85edb69cbaabd93262efd516 ​ as per @gentlecucumber advice, i also posted it on r/legaladvice: https://www.reddit.com/r/legaladvice/comments/170ll2t/d_is_meta_llama_2_license_agreement_safe_to_sign/?utm_source=share&utm_medium=web2x&context=3 submitted by /u/Particular_Flower_12 [link] [comments]
    [D] TesseractOCR vs PaddleOCR vs EasyOCR for Japanese text extraction
    Which would be the best OCR toolkit to invest the effort to learning and building a pipeline for an OCR system that will be used to extract Japanese text? I tried Tesseract initially and although I got some good results, I found it hard to do finetuning due to messy and outdated documentation. I haven't had the time to look at the other two OCR tools yet but if anyone had any experience, please do share them especially with how easy or difficult is the finetuning process as well as deploying the tuned models. submitted by /u/Spitfire_ex [link] [comments]  ( 9 min )
    [D] Adapting OpenSource GPT Models - requirements/possibilities?
    Hi, our company plans for some budget in 2024 to invest into hardware to do the following - running local LLMs for our coworkers to interfere with an locally running offline GPT alike ChatGPT. Use cases: generating templates for email, letters etc Translation (EN/GER/FR/SPA) Querying internal knowledge bases and/or FAQs/HOWTOs I did some research but it is still hard for me to estimate what are the HW / AI skill requirements to implement something not a quarter as good as ChatGPT. Ive played with Nomics gpt4all which comes close to a baseline. We cant use cloud services due to our data privacy policy, so I checked on what would be a good starting point to invest into hardware. I came up with a gamer PC (octacore Intel i9/AMD Ryzen 7) utilizing NVidia RTX 4090 (24Gb) / Radeon RX 7900 / 2TB SSD / 64Gb RAM for approximately 3600 Eur. I am pretty sure that would be sufficient to host a decent LLM serving simultaneous client requests. But is there also a way to adapt / process our companies data? Most sources state that proper LLMs were trained using hundreds of NVidia A100 and thousands of CPUs. On the other hand we would be fine with just fine-tuning a pretrained model. Could you please point me to some sources to learn more about possibilities and requirements as to be able to make well-informed investment decisions? Also, we probably lack the required skills, and would be interested to learn if there are companies and/or projects assisting with this kind of task? thanks submitted by /u/EatTFM [link] [comments]  ( 9 min )
    [D] - Are LoRAs able to improve results on reasoning benchmarks or is full-parameter fine tuning required?
    Is there any good research on which benchmarks LoRAs are most effective at impacting, or are they relegated mostly to changing the style of an LLM's response? submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [D] How to test if regression model is statistically significantly better, including its test error?
    I have a regression model, predicting a popularity of a text. I have its performance metrics on test set, e.g. RMSE and MAE. This gives me an uncertainty estimate about its predictions. Now I want to transform the text in some way, e.g. give it to human experts or another model to "upgrade" (in terms of getting better popularity). So I have the original and transformed text. Now I have 3 popularity scores: true popularity for original text predicted popularity for original text predicted popularity for transformed text Obviously, if model MAE is for example around 5, and predicted popularity for transformed text is higher than for the original by 1.5, this can be totally random, due to errors in the model prediction. How can I measure if text transformation is beneficial, i.e. statistically significantly better than the original text, incorporating information about model quality? Requiring that the improvement has to be higher than model error would be incredibly strict. submitted by /u/qalis [link] [comments]
    [D] David Donoho: Data Science at the Singularity (pushback on AGI singularity, advocates for Open Science and reproducibility)
    submitted by /u/wojcech [link] [comments]  ( 9 min )
    [R] Tensor Programs VI: Feature Learning in Infinite-Depth Neural Networks
    Paper: https://arxiv.org/abs/2310.02244 Abstract: By classifying infinite-width neural networks and identifying the optimal limit, Tensor Programs IV and V demonstrated a universal way, called μP, for widthwise hyperparameter transfer, i.e., predicting optimal hyperparameters of wide neural networks from narrow ones. Here we investigate the analogous classification for depthwise parametrizations of deep residual networks (resnets). We classify depthwise parametrizations of block multiplier and learning rate by their infinite-width-then-depth limits. In resnets where each block has only one layer, we identify a unique optimal parametrization, called Depth-μP that extends μP and show empirically it admits depthwise hyperparameter transfer. We identify feature diversity as a crucial factor in deep networks, and Depth-μP can be characterized as maximizing both feature learning and feature diversity. Exploiting this, we find that absolute value, among all homogeneous nonlinearities, maximizes feature diversity and indeed empirically leads to significantly better performance. However, if each block is deeper (such as modern transformers), then we find fundamental limitations in all possible infinite-depth limits of such parametrizations, which we illustrate both theoretically and empirically on simple networks as well as Megatron transformer trained on Common Crawl. Interesting, great to see this line of work continued, muP was great, now Depth-muP submitted by /u/_puhsu [link] [comments]  ( 9 min )
  • Open

    AI: Voice cloning tech emerges in Sudan civil war
    A campaign using AI voice cloning technology to impersonate Omar al-Bashir, the former leader of Sudan, has gained attention on TikTok. The anonymous account has been posting what it claims are 'leaked recordings' of the ex-president, despite Bashir not being seen in public for a year and being believed to be seriously ill. Experts warn that campaigns like this demonstrate how new tools can quickly and cheaply distribute fake content through social media. The authenticity of the recordings has been questioned, but evidence suggests that voice conversion software has been used to mimic Bashir's voice. TikTok has taken down the account, stating that it violated their guidelines on posting false content and the use of synthetic media. Source : https://www.bbc.co.uk/news/world-africa-66987869 submitted by /u/NuseAI [link] [comments]
    When AI tells you what you want to hear, even if it knows it's not true ... A Bard example
    I love Bard. It eloquently tells me things in a way that meets and exceeds my expectations, and even more than GPT-4. But what is Google's strategy behind programming it to say things that it knows are not true? Do they train it to say what the user wants to hear? If so, that's disconcerting to say the least, although could be a brilliant way to comb users for enhancement requests. Here is my most recent experience, a humorous example which I feel illustrates the point perfectly: Me: Bard, can you take a .wav audio file and create a phoneme transcript using the IPA symbol library? Bard: Yes, I can work from a .wav audio file to transcribe your speech into IPA phonemes. You can upload the file to a cloud storage service such as Google Drive. Me: Great. Here's the link to the file: <link…
    Just created /r/thisiswhyaiwasmade for the more silly side of AI generated content
    Join us! /r/thisiswhyaiwasmade is where we have fun and celebrate the enjoyable and entertaining side of the tech. Need a place to put your silly and playful prompts, AI generated content, ideas for ways the bend the rules of AI? This is the place! submitted by /u/unknowingafford [link] [comments]
    CGPT-4, aside from solving climate change, what is the most positive thing that AI can do for the world?
    In the context of maximizing positive impact, AI has transformative potential in healthcare. Let's start with personalized medicine. AI can analyze vast data sets, consisting of genetic, environmental, and lifestyle variables, to tailor medical treatments for individuals. This level of personalization could substantially increase treatment effectiveness and reduce adverse side effects. We're not just talking about better pills; we're envisioning targeted gene therapies and custom-tailored interventions that could be more curative rather than merely palliative. Beyond personalization, AI can democratize healthcare access. Remote areas without specialists could have AI systems that diagnose and recommend treatments, making quality healthcare accessible without the need for extensive travel. Telemedicine could be supercharged by AI, offering medical consultations and second opinions to anyone with internet access. Moreover, AI could lead breakthroughs in drug discovery and epidemiology. It could analyze complex biochemical interactions at a speed unattainable by humans, slashing the time and cost to bring new drugs to market. On a larger scale, AI could monitor and model the spread of diseases, providing actionable insights for containment and treatment strategies. So, AI in healthcare is not merely an incremental improvement. It's a paradigm shift that could equalize healthcare access and significantly extend human life while improving its quality. All these advancements could happen within our lifetime, changing the face of medicine and, by extension, society. submitted by /u/Georgeo57 [link] [comments]
    What's the difference between a human's brain and AI?
    Functioning. Humans use the brain's computing power, memory, and ability to think, whereas AI-powered machines rely on data and specific instructions fed into the system. Besides, it takes a very long time for humans to process and understand the problems and gets accustomed to them. submitted by /u/Virtual-Study-Campus [link] [comments]
    6 AI Apocalypse Scenarios And Why They're Wrong
    submitted by /u/arrowoftime [link] [comments]
    How to use custom instructions for ChatGPT like a Pro (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]
    DeepMind cofounder is tired of ‘knee-jerk bad takes’ about AI
    Mustafa Suleyman, the cofounder of DeepMind and CEO of Inflection AI, discusses his concerns about AI risks and the need for precaution. He believes that while some extreme scenarios may be over the top, it's important to treat powerful technologies with caution. Suleyman highlights the middle layer of AI risks that people often underestimate, which involves the amplification of goals for both good and bad actors. He emphasizes the need to contain AI to prevent potential negative consequences. Suleyman talks about the balance between risks and opportunities in technology and the importance of considering both aspects. He mentions the hype around generative AI and the need to look beyond the surface to understand its true potential. Suleyman discusses the discussions with lawmakers about AI and the challenge of bridging the gap between policy makers and tech experts. Source : https://venturebeat.com/ai/deepmind-cofounder-is-tired-of-knee-jerk-bad-takes-about-ai/ submitted by /u/NuseAI [link] [comments]
    Does Sam Altman Know What He’s Creating?
    submitted by /u/norcalnatv [link] [comments]
    DeepMind, Univ. of Illinois: Is self-correction a viable method to improve LLM reasoning? Probably not.
    Can LLMs actually improve their own reasoning by self-correcting mistakes? A new paper from DeepMind and the University of Illinois looks to answer this quantitatively. The results show that unaided, LLMs struggle at self-correction for reasoning tasks. The core issue is LLMs have trouble reliably evaluating the correctness of their own responses. They rarely identify flaws in initial reasoning. Sometimes LLMs even alter initially correct responses to become incorrect after self-correction! (I've personally seen this when interacting with ChatGPT many times and you probably have too). More complex techniques like critiquing between LLM instances don't help much either. External feedback or guidance looks necessary to improve reasoning (Well, some interesting parallels to this paper here about implicit improvement from preference data vs traditional RLHF). Self-correction does show promise for things like making responses more polite or safe though. Criteria there are more clear-cut. The authors argue we need to balance enthusiasm with realistic expectations on self-correction. It has a lot of limits for improving reasoning (at least with current models). But they suggest promising directions like incorporating high-quality external feedback from humans, training data, and tools. That could be key to unlocking self-correction's potential down the road. TLDR: Basically title... LLMs can't reliably self-correct reasoning yet. Maybe hybrid techniques combining self-correction with external guidance could work but we need more research. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]
    I need help finding a tool
    Buddy no of the tool where I can take an image have an AI translated and replace the text with the same style and have it in the new language like for example translating a Japanese image to English and have it look exactly the same just in English I'm looking for a free one that doesn't require credits it can be a desktop app or a website doesn't matter just needs to be free submitted by /u/agentduckman12 [link] [comments]
    How much do I have to edit AI generated images to become my own IP?
    Hey there! I'm a 1-man card game designer and while juggling the project as well as mt senior year of college, I have been relying heavily on AI-generated artwork to speed up my workflow with some illustrations and other forms of world-building. In regards to the recent legal decisions (in the US), in which any work produced by AI cannot be copyrighted, how much do I need to change the illustrations to become my own, if I even can at all? Thanks! Edit for clarity: I am also an illustrator. So this question comes from the perspective of an artist trying to save time and energy for other projects submitted by /u/Luke192 [link] [comments]
    Comparative Evaluation of Fine-Tuned and Standard Language Models in Emulating Living Historical Figures: A Detailed Study Proposal
    submitted by /u/alcanthro [link] [comments]
    JPMorgan CEO Jamie Dimon: AI will lead to 3.5-day workweek | Fortune
    Jamie Dimon says the next generation of employees will work 3.5 days a week and live to 100 years old submitted by /u/AminoOxi [link] [comments]
    Google unveils Pixel 8 built for 'the generative AI era' | CNN Business
    submitted by /u/pehnsus [link] [comments]
  • Open

    Improve prediction quality in custom classification models with Amazon Comprehend
    In this post, we explain how to build and optimize a custom classification model using Amazon Comprehend. We demonstrate this using an Amazon Comprehend custom classification to build a multi-label custom classification model, and provide guidelines on how to prepare the training dataset and tune the model to meet performance metrics such as accuracy, precision, recall, and F1 score.  ( 8 min )
    Fast and cost-effective LLaMA 2 fine-tuning with AWS Trainium
    Large language models (LLMs) have captured the imagination and attention of developers, scientists, technologists, entrepreneurs, and executives across several industries. These models can be used for question answering, summarization, translation, and more in applications such as conversational agents for customer support, content creation for marketing, and coding assistants. Recently, Meta released Llama 2 for both […]  ( 7 min )
  • Open

    New tools are available to help reduce the energy that AI models devour
    Amid the race to make AI bigger and better, Lincoln Laboratory is developing ways to reduce power, train efficiently, and make energy use transparent.  ( 11 min )
  • Open

    OpenAI's justification for why training data is fair use, not infringement [pdf]
    submitted by /u/nickb [link] [comments]
    Traveling Words: A Geometric Interpretation of Transformers
    submitted by /u/nickb [link] [comments]
  • Open

    Tangent sum
    When I was writing my post on lemniscate functions yesterday, a line from the Wikipedia article seemed familiar for reasons I cannot place. Defining a tangent-sum operator as a ⊕ b := tan(arctan ⁡ a + arctan ⁡ b) gives cl² z ⊕ sl² z = 1. I feel like I’ve seen this tangent-sum used before, but […] Tangent sum first appeared on John D. Cook.  ( 6 min )
    Enriched categories
    We begin with a couple examples. First, the set of linear transformations from one vector space to another is itself a vector space. Second, the set of continuous linear operators from one Banach space to another is itself a Banach space. Or maybe better, this set can be made into a Banach space. In the […] Enriched categories first appeared on John D. Cook.  ( 6 min )
    p-norm trig functions and “squigonometry”
    This is the fourth post in a series on generalizations of sine and cosine. The first post looked at defining sine as the inverse of the inverse sine. The reason for this unusual approach is that the inverse sine is given in terms of an arc length and an integral. We can generalize sine by […] p-norm trig functions and “squigonometry” first appeared on John D. Cook.  ( 5 min )
    Geometric derivation of hyperbolic trig functions
    This is the third post in a series on generalizing sine and cosine. The previous post looked at a generalization of the sine and cosine functions that come from replacing a circle with a lemniscate, a curve that looks like a figure eight. This post looks at replacing the circle with a hyperbola. On the […] Geometric derivation of hyperbolic trig functions first appeared on John D. Cook.  ( 5 min )
  • Open

    HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world
    HoloAssist is a new multimodal dataset consisting of 166 hours of interactive task executions with 222 participants. Discover how it offers invaluable data to advance the capabilities of next-gen AI copilots for real-world tasks. The post HoloAssist: A multimodal dataset for next-gen AI copilots for the physical world appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas
    Connecting with researchers, collaborating across disciplines, and exploring a new city—PhD students Jennifer Scurrell and Alejandro Cuevas talk to Senior Researcher Madeleine Daepp about the internship experience at Microsoft Research. The post Intern Insights: Dr. Madeleine Daepp with Jennifer Scurrell and Alejandro Cuevas appeared first on Microsoft Research.  ( 29 min )
  • Open

    Brains of the Operation: Atlas Meditech Maps Future of Surgery With AI, Digital Twins
    Just as athletes train for a game or actors rehearse for a performance, surgeons prepare ahead of an operation. Now, Atlas Meditech is letting brain surgeons experience a new level of realism in their pre-surgery preparation with AI and physically accurate simulations. Atlas Meditech, a brain-surgery intelligence platform, is adopting tools — including the MONAI Read article >  ( 7 min )
    Fall in Line for October With Nearly 60 New Games, Including Latest Game Pass Titles to Join the Cloud
    October brings more than falling leaves and pumpkin spice lattes for GeForce NOW members. Get ready for nearly 60 new games to stream, including Forza Motorsport and 16 more PC Game Pass titles. Assassin’s Creed Mirage leads 29 new games to hit the GeForce NOW library this week. In addition, catch a challenge to earn Read article >  ( 9 min )

  • Open

    Ring Attention with Blockwise Transformers for Near-Infinite Context
    submitted by /u/nickb [link] [comments]
    Think before you speak: Training Language Models With Pause Tokens
    submitted by /u/nickb [link] [comments]
    Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/nickb [link] [comments]
    AI has been reading my mind.
    I know several people that tell me whenever they say something out loud, they start seeing it advertised to them or on their feed. But for me, if I think of certain things, even if I never said it out loud, it will appear on my feed.. has anything similar been happening to anyone else? submitted by /u/GuaranteedBigBoy [link] [comments]
  • Open

    [P] Open-source project to run locally LLMs in browser, such as Phi-1.5 for fully private inference
    Excited to introduce BlindChat (https://github.com/mithril-security/blind_chat), an open-source, privacy-centric alternative to ChatGPT for in-browser Conversational AI! We provide full local inference in browser, by using libraries from Hugging Face like transformers.js or candle for WASM inference. We have supported several small models, the latest one being Phi-1.5, the 1.3B model that beat Llama 2 7b! As Microsoft’s researchers mentioned in their paper, the model often produces incorrect code and statements. They are just suggestions, and this model is not trained for instruction tuning, so it might be harder to use than regular chat. More info on their model card (https://huggingface.co/microsoft/phi-1_5). We would love to have your feedback on our project, as we are aiming to build a privacy-first and open-source alternative to ChatGPT! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] What is the relation between learning rate and vanishing gradient problem?
    How can we tackle vanishing gradient problem by changing the learning rate? Is it possible? submitted by /u/InternationalBack472 [link] [comments]  ( 9 min )
    [P] Torchsummary not working with your layers again? Try this lightweight alternative
    pip install output-shape It is a minimalistic and simple alternative to torchsummary with a simple print of the output shape of a layer, or custom layer. For torch.nn.MultiheadAttention, it handles both the output shape and the attn matrix separately. https://github.com/avocardio/output-shape Currently only works with PyTorch models, soon with Tensorflow / Keras as well. Jax is also on the list for later! submitted by /u/capital-man [link] [comments]  ( 9 min )
    [D] Thoughts on current Vector DB landscape?
    Hello, What are your thoughts on current Vector DB offerings? For instance: Do you think the pricing for them is reasonable/viable? Do you think there’s a sufficient level of developer/user experience? What about for those who aren’t necessarily specialized in data? If you like a managed service, why do you prefer it over the open source alternatives? submitted by /u/LucasSaysHello [link] [comments]  ( 9 min )
    [R] NeuRBF: A Neural Fields Representation with Adaptive Radial Basis Functions
    Project Page Paper Code We present a novel type of neural fields that uses general radial bases for signal representation. State-of-the-art neural fields typically rely on grid-based representations for storing local neural features and N-dimensional linear kernels for interpolating features at continuous query points. The spatial positions of their neural features are fixed on grid nodes and cannot well adapt to target signals. Our method instead builds upon general radial bases with flexible kernel position and shape, which have higher spatial adaptivity and can more closely fit target signals. To further improve the channel-wise capacity of radial basis functions, we propose to compose them with multi-frequency sinusoid functions. This technique extends a radial basis to multiple Fourier radial bases of different frequency bands without requiring extra parameters, facilitating the representation of details. Moreover, by marrying adaptive radial bases with grid-based ones, our hybrid combination inherits both adaptivity and interpolation smoothness. We carefully designed weighting schemes to let radial bases adapt to different types of signals effectively. Our experiments on 2D image and 3D signed distance field representation demonstrate the higher accuracy and compactness of our method than prior arts. When applied to neural radiance field reconstruction, our method achieves state-of-the-art rendering quality, with small model size and comparable training speed. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D]
    Hi guys ! I am going to purchase a laptop for programming and AI tasks. I will be working on a simulation software project related to the trajectory of an object in 2d and 3d space. Which laptop will be the most suitable for these tasks and it should have high battery backup because the place where I work does not have enough power sockets. The first laptop which came into my mind was Macbook pro with M2 pro chip and Lenovo Thinkpad X1 Carbon gen 10. Suggest me the best. submitted by /u/smitherium [link] [comments]  ( 9 min )
    [Discussion] Feature Selection Algorithms
    I have only 200 samples but about 30 features. What are some effective commonly used feature selection algorithms? I want to identify the features that play the most significant role in generating outcomes. submitted by /u/Shina-pig [link] [comments]  ( 9 min )
    [R] Will a small error be determining in the final decision for my paper?
    About a week ago, I submitted my first paper into one of the most prestigious Machine Learning conferences out there. This was a last minute submission, and my supervisor and I were working on it simultaneously until the very last moment. Sadly, my supervisor committed an error when writing the mathematical definition of a certain set, slightly changing its meaning. This change, even though small, changes the definition in such a way that the subsequent theorem and its proof isn't formally correct anymore, as it assumes the original definition of the set, not the new one. How much will this affect the decision of accepting or rejecting my paper? The whole method, results and consequences are still the same, no matter this definition. It's more a problem of a "formal" nature (here "formal" as a word in the mathematical sense). Is there a other way that I can inform about this error without changing the content maybe? I know that at some point, they give a chance to edit the original paper, but I don't know if this is after the decision to accept/reject. submitted by /u/howtorewriteaname [link] [comments]  ( 9 min )
    How can I apply object detection and image segmentation functionality to my current custom-trained Image Classification model? [D]
    So, a few months ago, I started developing this deep learning model, which was made purely to differentiate whether the input image is driftwood floating in water or a crocodile. To my knowledge, I leveraged the resnet50 pre-trained SoTA model to train my deep learning model, and for that, I downloaded almost 5k images of driftwood and crocodiles for my model training. Once the training was complete, I took the next step and deployed my model on the Hugging Face Spaces app, allowing my friends to put it to the test. But here's where I ran into a significant challenge: users could even upload their own selfies, and my model would attempt to predict whether they were a crocodile or a piece of driftwood! So how can I leverage object detection or the image segmentation pipeline so that when the user inputs their image, it tries to detect the object from the photo and then detect whether the detected object from the given image contains a crocodile or not? If the crocodile or driftwood is not found then it should return "No object found" or like that. submitted by /u/meWhoObserves [link] [comments]  ( 9 min )
    [R] Large Language Models Represents Space and Time
    Paper - https://arxiv.org/abs/2310.02207 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [R] Help Shape the Future of Machine Learning: Take Our Short Survey and Let's Create Something Amazing Together!
    Hello Redditors in r/MachineLearning We are the team behind ML Workbench, an upcoming integrated platform designed to streamline your entire machine learning lifecycle. From data preprocessing and model training to validation and deployment, we aim to make the process as seamless as possible. But here's the thing: we need your insights to build something that truly resonates with the community and solves real-world problems. 📝 Click Here to Take the Survey Why Should You Care? Unified Experience: Imagine managing all your ML tasks in one integrated environment. High-Performance Computing: We're leveraging powerful A100 GPUs to accelerate your work. User-Centric Design: Whether you're a beginner or a pro, the platform is designed to cater to all skill levels. Collaboration: Built-in features to make team collaboration effortless. What's in the Survey? The survey contains questions about your current challenges, the tools you use, and what you'd love to see in an ML platform. It should only take about 5-10 minutes to complete. Thank You Gift As a small token of our appreciation, we're offering exclusive early access to the platform for selected participants. Don't miss this chance to be among the first to experience what we're building! 📝 Click Here to Take the Survey Your feedback is crucial for us to create a tool that we hope will make a significant positive impact in the machine learning community. Thank you for taking the time to read this post and participate in our survey. Cheers, The ML Workbench Team submitted by /u/nonononottodayorever [link] [comments]  ( 9 min )
    [P] Video Event Detection
    Hi, I'm looking to create a model that given a sequence of frames from a video, returns a probability distribution over a set of events that may have occurred in those frames (probably 5 - 10 events). The training data will consist of video and hand labelled frame index/event pairs. I'm not too concerned about handling simultaneous events. It would be super helpful for some suggestions on a model architecture that would yield the best results and/or good papers/examples that achieve something similar. Thanks! submitted by /u/Dredgefort [link] [comments]  ( 9 min )
    [P] Retrieval augmented generation with OpenSearch and reranking [Video tutorial]
    I created a video tutorial that tries to demonstrate that semantic search (using embeddings) is not always necessary for RAG (retrieval augmented generation). It was inspired by the following Cohere blog post: https://txt.cohere.com/rerank/ I code up a minimal RAG pipeline: OpenSearch -> Rerank -> Chat completion (without using Langchain or similar libraries) and then see how it performs on various queries. Hope some of you find it helpful. Feel free to share any feedback@ Video link: https://youtu.be/OsE7YcDcPz0 submitted by /u/mildlyoverfitted [link] [comments]  ( 9 min )
    [R] Hacking an NLP benchmark: How to score 100 points on AMR parsing
    AMR parsing is a fun task where researchers map texts onto little graphs that explicate their meaning, so called Abstract Meaning Representations (AMRs). While arguably not the top NLP benchmark regarding popularity, research has been active for the last 10 years, including at major NLP conferences such as ACL/NAACL/EACL/EMNLP etc. Funnily, I recently found some vulnerabilities in the evaluation protocol, and if we exploit these vulnerabilities, we can get the highest score on the benchmark. To get an overview over the issue (without understanding AMR), imagine a cooking contest that takes place regularly, say, once a year. In all events, we have the same judge, participants are amateurs, meals are scored on 0 to 100, with 100 meaning “it can’t possibly get better”. Over the years, the …  ( 10 min )
    [D] Looking for an article related to machine learning in medicine to be presented at a journal club
    Hi all, I'm curious if anyone has a stand-out article they believe would prompt a lively discussion in a journal club I have coming up. Something that may have people take sides, or maybe a recent breakthrough in the ML space as it relates to clinical/health care. ​ Thanks! submitted by /u/veilofosiris [link] [comments]  ( 9 min )
    [R] Think before you speak: Training Language Models With Pause Tokens
    Paper - https://arxiv.org/abs/2310.02226 submitted by /u/MysteryInc152 [link] [comments]  ( 8 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some object detection models which support ensembling out of the box like Yolov5?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    [P] Using pre-trained models as features?
    Hey everyone! Currently, I am working on a project around music emotion classifcation/regression model. Basically I am trying to predict a score to each emotion on a given song. The problem is that my dataset has quite imbalanced scores (y). Most scores are centered around a certain score range. Therefore, having difficulties predicting scores that are further away of the mean values. I had this idea to bring in pre-trained (on other datasets and problems) audio classification models into this as there are a bunch of good performing pre-trained classification models out there already. The prediction of these pre-trained models should be used as features (e.g. prediction of genre, instrument etc) beside the original spectorgram in my model. I know this won't solve the problem of imbalances in the scores but I thought maybe this could improve the performance as the model would have more features to work with. Does this make sense? I appreciate any input. submitted by /u/Kniggi [link] [comments]  ( 9 min )
    [D] LOMO underrated
    Does anyone have an idea why the LOMO optimizer (low memory optimizer) which was released a few months ago is not widely available and everyone still uses either Adam or SGD? While the paper looks really promising submitted by /u/RedMoula [link] [comments]  ( 9 min )
    [P] Camera based monitoring of infant's breathing
    Hi! I recently have seen systems that monitor breathing rate of an infant through camera. I have read several articles on that topic, where people used things like 3D camera, RGB or Interferometric Radar Sensor. Do you guys have any idea on how to accurately measure this? submitted by /u/kaina_m [link] [comments]  ( 9 min )
    [R] Towards Self-Assembling Artificial Neural Networks through Neural Developmental Programs
    submitted by /u/hardmaru [link] [comments]  ( 8 min )
    [D] How Do You Track Projects in a Scaling ML Team?"
    I am part of a Machine Learning team that has experienced significant growth recently. When we were a small team, tracking projects was straightforward. However, as the team has expanded, it's become increasingly challenging to keep track of everything. We are part of a larger corporation, so we have access to tools for creating epics and boards. However, these corporate tools are too generic and don't provide the level of detail I need for internal management. Specifically, I'm looking for a way to track model versions, dataset versions, and the overall status of our projects. I'd also like to be able to assign team members to projects. Currently, we use a MIRO board, but it's disorganized and difficult to read and update. I'd love to hear what tools or strategies you've used for similar situations, especially since our team is expected to grow even more, making tracking increasingly complex. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
  • Open

    Lemniscate functions
    In the previous post I said that you could define the inverse sine as the function that gives the arc length along a circle, then define sine to be the inverse of the inverse sine. The purpose of such a backward definition is that it generalizes to other curves besides the circle. For example, it […] Lemniscate functions first appeared on John D. Cook.  ( 5 min )
    Generalized trigonometry
    In a recent post I mentioned in passing that trigonometry can be generalized from functions associated with a circle to functions associated with other curves. This post will go into that a little further. The equation of the unit circle is and so in the first quadrant The length of an arc from (1, 0) […] Generalized trigonometry first appeared on John D. Cook.  ( 5 min )
  • Open

    A Mine-Blowing Breakthrough: Open-Ended AI Agent Voyager Autonomously Plays ‘Minecraft’
    For NVIDIA Senior AI Scientist Jim Fan, the video game Minecraft served as the “perfect primordial soup” for his research on open-ended AI agents. In the latest AI Podcast episode, host Noah Kravitz spoke with Fan on using large language models to create AI agents — specifically to create Voyager, an AI bot built with Read article >  ( 6 min )
    How AI Helps Fight Wildfires in California
    California has a new weapon against the wildfires that have devastated the state: AI. A freshly launched system powered by AI trained on NVIDIA GPUs promises to provide timely alerts to first responders across the Golden State every time a blaze ignites. The ALERTCalifornia initiative, a collaboration between California’s wildfire fighting agency CAL FIRE and Read article >  ( 6 min )
  • Open

    LLMs May Be The Trojan Horse That Modernizes Software Development
    submitted by /u/geekteam6 [link] [comments]
    Why PepsiCo is powering your snacks with AI
    Using AI to improve Cheetos? That's something PepsiCo has experimented with. On today’s POLITICO Tech, Athina Kanioura, chief strategy and transformation officer for PepsiCo, says that using AI to make employees faster and more efficient hasn’t led PepsiCo to replace human workers as many fear. And why the company has determined that in some jobs the technology is simply off limits. Listen to the interview here: https://politico-tech.simplecast.com/episodes/why-pepsico-is-powering-your-snacks-with-ai submitted by /u/smo279 [link] [comments]
    New Paper: Enabling Language Models to Implicitly Learn Self-Improvement From Data
    LLMs keep getting more capable at generating natural language. But there's always room for improving the quality and alignment of their responses. Typically this requires lots of human effort to collect more training data. So researchers are exploring ways for models to self-improve without human involvement. Many methods use prompting - giving the LLM instructions to critique and refine its responses. But coming up with comprehensive prompts is challenging. The new approach proposed, called PIT, lets models learn self-improvement implicitly from human preference data instead. It reformulates reinforcement learning to maximize the gap between an original response and improved response conditioned on the original. This taps into the implicit guidance in the preference data on what constitutes better quality, so no manual rubrics are needed. PIT uses curriculum reinforcement learning - first improving easy references, then switching to the LLM's own samples. Experiments on real and synthetic datasets show PIT significantly outperforms prompting methods like Self-Refine. It improved response quality 7-34% across conditions without any human involvement. This demonstrates a promising direction for LLMs to align better with human preferences autonomously as they learn from experience. No need for human bottlenecks when expanding to new domains or underserved use cases. Very cool! TLDR: New method PIT enables LLMs to implicitly learn to refine themselves from human preference data, no prompts needed. Big improvement over prompting approaches. Full Summary Arxiv is here: https://arxiv.org/abs/2310.00898 submitted by /u/Successful-Western27 [link] [comments]
    $5k in grants or $250k funding for AI startups. Backed by OG's
    AI Grant is offering $5k in grants or $250k in funding for AI startups. The program is backed by OG's AI Grant, an accelerator for AI startups. The grant includes an uncapped SAFE investment of $250,000 for AI-native product startups, $350,000 in Azure credits, a summit in San Francisco with advisors and founders, and various other startup benefits and credits. The program was created by Nat Friedman and Daniel Gross. Applications for Batch 3 will open in a few months, but early applications are accepted. The program is open to anyone, and it is looking for companies or projects that leverage AI models in a useful or engaging way. Source : https://aigrant.com/ submitted by /u/NuseAI [link] [comments]
    AI will teach everyone to read and write. It's already begun.
    https://www.imagineworldwide.org/ "What is Child-Directed, Tech-Enabled Learning? Children drive their own learning, at their own pace, using software that provides a complete, research-based curriculum and pedagogy. Adults play a supportive, facilitative role. The software is delivered to the learner on a tablet, without connectivity, and charged by solar power or other appropriate energy sources... With hundreds of millions of children out of school or lacking access to effective schooling, this model can provide every child, everywhere access to learning. Solutions can work without internet access or grid power. Adults play facilitative, rather than instructional, roles. The annual unit cost of the learning solution is less than $7 per child and declining. This includes hardware, software, accessories, power, shipping, and implementation support from Imagine." submitted by /u/Georgeo57 [link] [comments]
    AI is replacing customer service jobs across the globe
    Artificial intelligence (AI) is replacing customer service jobs around the world, with chatbots being used to interact directly with customers and solve problems independently. This shift is expected to have a profound effect on economies, particularly in countries like India and the Philippines where call centers provide millions of jobs. While some argue that AI will provide support to remaining call center workers and improve job satisfaction, others warn that it could lead to job losses and a need for workforce adaptation. The use of AI software tools in call centers has shown potential for improving productivity and customer satisfaction. Source : https://www.washingtonpost.com/technology/2023/10/03/ai-customer-service-jobs/ submitted by /u/NuseAI [link] [comments]
    Female-founded AI startups win just 2% of funding deals in UK
    Female-founded AI startups in the UK account for just 2% of funding deals over the past decade, according to a report by the Alan Turing Institute. When female-founded companies do secure funding, they raise an average of £1.3m per deal, compared to £8.6m raised by all-male founder teams. The report highlights the urgent need for gender balance in AI investment, as the industry is predicted to grow significantly in the coming years. Recommendations to improve gender balance include improving recruitment, monitoring investment practices, and diversifying the ecosystem. There is an increasing demand for generative AI products, with leading tech companies investing heavily. Gender diversity gaps and uneven progress rates for ethnic and racial groups are observed across investment firms. AI products have shown biases, such as passport checkers working less efficiently with darker skin and tools reinforcing gender stereotypes. In 2019, a UN agency found that assigning female genders to digital assistants like Siri and Alexa perpetuated harmful gender biases. Source : https://www.theguardian.com/technology/2023/oct/04/female-founded-ai-startups-win-just-2-of-funding-deals-in-uk submitted by /u/NuseAI [link] [comments]
    I used Riffusion (Stable Diffusion, but for music) to turn my own music into "jazz", "Radiohead", "Muse" or "Nirvana" songs, I'm amazed by the results
    submitted by /u/cI_-__-_Io [link] [comments]
    Visa Announces $100 Mn Fund for Generative AI Companies
    submitted by /u/Agitated-Spell3979 [link] [comments]
  • Open

    My Impressions (and Application) of the Heidelberg Laureate Forum 2023
    This September, I had the chance to attend the Heidelberg Laureate Forum (HLF) for the second — and probably last — time. The HLF is an incredible experince for young researchers: Mirroring the Lindau Nobel Laureate Meetings, the organizers invite laureates from math and computer science together with young researchers pursuing their undergraduate, graduate or post-doc studies. In this article, I want to share impressions and encourage students to apply next year! The post My Impressions (and Application) of the Heidelberg Laureate Forum 2023 appeared first on David Stutz.  ( 7 min )
  • Open

    Simplify medical image classification using Amazon SageMaker Canvas
    Analyzing medical images plays a crucial role in diagnosing and treating diseases. The ability to automate this process using machine learning (ML) techniques allows healthcare professionals to more quickly diagnose certain cancers, coronary diseases, and ophthalmologic conditions. However, one of the key challenges faced by clinicians and researchers in this field is the time-consuming and […]  ( 11 min )
    Create an HCLS document summarization application with Falcon using Amazon SageMaker JumpStart
    Healthcare and life sciences (HCLS) customers are adopting generative AI as a tool to get more from their data. Use cases include document summarization to help readers focus on key points of a document and transforming unstructured text into standardized formats to highlight important attributes. With unique data formats and strict regulatory requirements, customers are […]  ( 9 min )
    Automate prior authorization using CRD with CDS Hooks and AWS HealthLake
    Prior authorization is a crucial process in healthcare that involves the approval of medical treatments or procedures before they are carried out. This process is necessary to ensure that patients receive the right care and that healthcare providers are following the correct procedures. However, prior authorization can be a time-consuming and complex process that requires […]  ( 7 min )
  • Open

    Scalable spherical CNNs for scientific applications
    Posted by Carlos Esteves and Ameesh Makadia, Research Scientists, Google Research, Athena Team Typical deep learning models for computer vision, like convolutional neural networks (CNNs) and vision transformers (ViT), process signals assuming planar (flat) spaces. For example, digital images are represented as a grid of pixels on a plane. However, this type of data makes up only a fraction of the data we encounter in scientific applications. Variables sampled from the Earth's atmosphere, like temperature and humidity, are naturally represented on the sphere. Some kinds of cosmological data and panoramic photos are also spherical signals, and are better treated as such. Using methods designed for planar images to process spherical signals is problematic for a couple of reasons. Firs…  ( 92 min )
  • Open

    Why DQN method is only suitable for small discrete action space? What is the issue if action space is large and continous?
    submitted by /u/aabra__ka__daabra [link] [comments]
    Up to date Metaworld documentation
    Hello everyone, I want to start experimenting with the domain of multi-tasking and meta-learning, thus I pip installed metaworld which is currently on version 2.0.0 if I'm not mistaken. I wanted to ask in case anybody knows, if there's any recent updated documentation, because the farama foundation on GIthub which is probably responsible for maintaining the Metaworld, has outdated code and documentation. (for example, presented code on Github's README has the command env.step(a) which returns 4 values instead of 5 that newer version outputs). From what I understand, they gather contributors for a big push regarding code and documentation on GItHub, where they will make up things up to date again but this announcement was 7 months ago. Sorry for the potentially wrong format of this question-post, I'm relatively new to reddit. I would appreciate any further knowledge on this topic and thanks everyone who's taking the time to read it! ​ Metaworld Distribution from Farama Foundation on Github: https://github.com/Farama-Foundation/Metaworld submitted by /u/South_Book_5625 [link] [comments]
    The future of game testing is here, and it is powered by Artificial Intelligence! 🔥
    Hi everyone! We used our opensource library SheepRL 🐑 and our PyTorch implementation of DreamerV3 on Crafter, an open-world survival game, featuring randomly generated 2D worlds, in which players have the freedom to explore a large and expansive map and need to forage for food, collect materials, build tools and find shelter. Here is a short video 👉 https://youtu.be/7XEBT2msUUQ In open-world games, ensuring they are playable and bug-free is crucial, but is becoming increasingly difficult and time-consuming using manual game testing. Maximizing exploration using Reinforcement Learning is extremely useful for testing games at scale, because of the wide variety of gameplay scenes the player may encounter. Why is the test on Crafter so interesting for game testing? Because Crafter evaluates a large number of general capabilities related to the RL agent, like strong ability to generalise (new generated maps for each episode), to deal with partial observability (each input image reveals only a small part of the world) and to long-term reasoning and survival. These abilities are very useful for testing games at scale, providing developers with insights to optimise gameplay and player experience. The future of game testing is here, and it is powered by Artificial Intelligence! 🔥 --- ❌ Are you interested in joining the project community? Get in touch 👉 https://github.com/Eclectic-Sheep/sheeprl ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI. Feel free to use it for your AI projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ submitted by /u/Manu_Orobix [link] [comments]
    Can I use Continuous algorithms (e.g. TD3) for Discrete Action spaces?
    My environment has hybrid action spaces and I was wondering if I can use continuous algorithms for discrete action spaces. I'm asking this because, well, agent can't learn and I'm trying to find the source of error. I was wondering if this was the source of problem. ​ My Assumptions On Solving This Problem: - Discrete is subspace of continuous, thus continuous algorithms will be able to handle discrete action spaces as well. - A non-hybrid action-space algorithm will be simpler than hybrid-action-space algorithms. ​ Method (I'm only describing the discrete action here): - Use TD3 as the training algorithm. No modification from the original training code. TD3 algorithm has been verified on Pendulum and other environments created for unit test purposes. - Policy network outputs the a…

  • Open

    Video Game Voice Actors Are Ready to Strike over AI. Here’s Why
    Video game voice actors are prepared to go on strike over the use of AI in game development. The current contract negotiations between the Screen Actors Guild-American Federation of Television and Radio Artists (SAG-AFTRA) and video game companies have stalled, with the major issues being pay raises and the use of AI to alter or generate actors' performances. SAG-AFTRA wants protections for its members to ensure their work is not stolen or replaced by AI. If negotiations don't progress, voice actors, stunt artists, and motion capture performers could potentially go on strike, leading to delays in game releases and recasting of beloved performers. The voice actors' strike in 2016 resulted in improvements to pay, and now they are prepared to strike again to fight for their rights. Video game performances are often seen as assets to be extracted and inserted into games, rather than recognizing the humanity and quality of life of the performers. The use of AI in game development raises concerns about how companies will use advances in generative AI to steal work or put performers out of a job. SAG-AFTRA wants transparency, consent, and compensation when it comes to the use of AI in games. Members of SAG-AFTRA have voted in favor of authorizing a strike, meaning voice actors, stunt artists, and motion capture performers could potentially join the picket line if negotiations don't progress. The strike could lead to delays in upcoming game releases and the recasting of performers if companies refuse to meet the union's demands. The fight for voice actors' rights is an existential one, as they want to retain the rights to their own voices and images and achieve wages that keep up with inflation Source : https://kotaku.com/sag-aftra-strike-voice-actor-spider-man-ai-union-1850874117 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    [Question] Any 3X AI?
    Wanted to see if there are any 3X AI generated images available? I’m looking to see how I could use AI to generate images for my website. submitted by /u/IamMoe8868 [link] [comments]  ( 8 min )
    TikTok ran a deepfake ad of an AI MrBeast hawking iPhones for $2
    TikTok ran an ad featuring a deepfake of MrBeast offering iPhone 15 Pros for $2. AI-generated deepfake content is becoming more pervasive on social media platforms. Platforms like TikTok are facing challenges in moderating and handling the rise of AI deepfakes. MrBeast raised concerns about the ability of social media platforms to handle AI deepfakes. TikTok removed the ad and associated account for policy violations. Unauthorized AI-generated content featuring celebrities is a growing problem in platform advertising. The issue is expected to worsen as AI technology improves and becomes more accessible. Transparency and disclosure are crucial in AI-generated ad content featuring celebrities. TikTok is aware of the pervasiveness of AI-generated content on its platform and is taking steps to address it. Source : https://www.businessinsider.com/tiktok-ran-deepfake-ad-mrbeast-as-ai-generated-content-spreads-2023-10 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Infinitia will apparently let you create your own AI enabled social simulations
    Came across this upcoming game which supposedly let's you create your own worlds and characters to live in the world...they also released a research paper explaining how they're doing it, using LLMs in all sorts of ways, primarily for reasoning and language. I think it could be a pretty fun take on passive games, just populating a world with your characters, checking up on them occasionally, putting them in weird situations lol. infinitia.ai for those who wanna check it out The NPCs do seems to be acting in an interesting way, as i saw in this video they posted on twitter... https://twitter.com/infinitia_app/status/1707102187518628245 ​ Watchall think? Another smallville clone? or something interesting.... submitted by /u/SeaJeweler3723 [link] [comments]  ( 9 min )
    Efficient AI design of robots.
    submitted by /u/DrJosh [link] [comments]  ( 8 min )
    From Stone to Silicon: The Odyssey of Humanity and Technology
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Don't Worry, AI Cannot Takeover the World, It Will Run Out of Battery
    The article discusses the importance of batteries in AI technology and how they limit the capabilities of AI robots. It explores the challenges of current battery technology and the need for better solutions. The article emphasizes the significance of developing ideal batteries that can provide long-lasting power without degradation. Source : https://notes.arkinfo.xyz/p/dont-worry-ai-cannot-takeover-the submitted by /u/NuseAI [link] [comments]  ( 9 min )
    GPT-4 outperforms its rivals in new AI benchmark suite GPT-Fathom
    ByteDance and the University of Illinois researchers have developed an improved benchmark suite with consistent parameters, called GPT-Fathom, that indicates GPT-4, the engine behind the paid version of ChatGPT, significantly outperforms leading LLMs, including its biggest competitor, Claude 2. For the latest advancements in AI, look here first. ​ https://preview.redd.it/v4fo8zser0sb1.png?width=1292&format=png&auto=webp&s=7e29fe9ac1af3efcb936ee61e9202717eed7e702 GPT-Fathom's breakthrough The new benchmark suite, GPT-Fathom, addresses consistent settings issues and prompt sensitivity, attempting to reduce inconsistencies in LLM evaluation. In a comparison using GPT-Fathom, GPT-4 outperformed over ten leading LLMs, crushing the competition in most benchmarks, and showing significant performance leaps from GPT-3 to its successors. Performance specifics The gap in performance was especially pronounced against Claude 2, ChatGPT's biggest rival. GPT-4's Advanced Data Analysis model exhibited superior performance in coding, giving it an edge as compared to LuckLlama 2, the current best-performing open-source model. Llama 2-70B showed comparable or better performance than gpt-3.5-turbo-0613 in safety and comprehension but displayed worse performance in "Mathematics", "Coding", and "Multilingualism". The seesaw effect The research team noted a 'seesaw effect' where an improvement in one area can lead to degradation in another. For instance, GPT-4 saw a performance drop on the Mathematical Geometry Simple Math (MGSM) benchmark, despite improving its performance significantly on the text comprehension benchmark DROP. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Runway has launched Gen 2 Director mode. The speed at which this company works is Insane
    submitted by /u/First_Development101 [link] [comments]  ( 8 min )
    How Ai Could Upend Foreign Policy - An Interview with Ian Bremner and Mustafa Suleyman
    submitted by /u/finger_puppet_self [link] [comments]  ( 8 min )
    CGPT-4 on some ways "prompt/response" posts advance AI
    Prompt: Explain why posting prompts and your responses speeds up AI development in the following ways: It allows us to analyze responses to prompts that we would probably never have thought of. It allows us to share ideas that the open source community would advance. It allows us to better democratize the process of alignment, bringing in as many people as possible. CGPT-4: Posting prompts and AI responses serves multiple functions that are beneficial to the progress of AI development. First off, it exposes AI to a myriad of queries that developers might not have otherwise considered. This wide range of prompts allows the machine to improve its natural language processing abilities, making it more versatile and effective in interpreting and responding to human language. The more dive…  ( 9 min )
    A.I Makes a Video game on the App Store
    submitted by /u/usmansid98 [link] [comments]  ( 8 min )
    Infinite context windows? Streaming LLMs can be extended to infinite sequence lengths without any fine-tuning.
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Where do I produce free intro and outro AI music for my Podcast for free.
    I am starting a podcast on Psychology and Philosophy submitted by /u/21bce [link] [comments]  ( 8 min )
    BackerKit Will Restrict the Use of AI Art
    Crowdfunding site BackerKit has announced a new policy that restricts the use of solely AI-generated content on its platform. The policy aims to address concerns regarding ownership of content, ethical sourcing of data, and compensation for the process of creating content. Projects that lack a minimum requirement of human input will not be allowed to crowdfund on the BackerKit site. There is some flexibility with AI generative fill and the use of AI transcription services, but a high level of human input is required to satisfy the policy. BackerKit will automatically exclude all content uploaded by creators for their projects from AI training in support of this policy. The new restrictions will go into effect on October 4, giving creators time to alter their projects if they are using AI-generated images and text. Source : https://gizmodo.com/backerkit-ai-art-new-policy-crowdfunding-generative-1850891882 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/2/2023
    iPhone designer Jony Ive is reportedly talking to OpenAI CEO Sam Altman about making an AI hardware device.[1] Visa announced today that it plans to invest $100 million in companies developing generative AI technologies and applications “that will impact the future of commerce and payments.”[2] More than 40% of labor force to be affected by AI in 3 years, Morgan Stanley forecasts. [3] Tom Hanks: Don't fall for "AI version of me" promoting dental plan.[4] Sources: [1] https://www.businessinsider.com/chatgpt-head-iphone-designer-jony-ive-ai-device-openai-report-2023-9?amp [2] https://techcrunch.com/2023/10/02/visa-earmarks-100m-to-invest-in-generative-ai-companies/ [3] https://www.cnbc.com/2023/10/02/more-than-40percent-of-labor-force-to-be-impacted-by-ai-in-three-years-morgan-stanley-forecasts.html [4] https://www.cbsnews.com/amp/news/tom-hanks-ai-version-of-me-promoting-dental-plan/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    [D] What are some effective dimensionality reduction (unsupervised feature selection) techniques for a high dimensional, sparse dataset?
    I am considering comparing mutual information scores, but I also don't think I understand MI well enough. For example, I(X;Y) = H(X) + H(Y) - H(X,Y). To me, visualizing H(X) and H(Y) as venn diagrams and H(X,Y) as the information from both X, Y (like an overlapping venn diagram) makes me think that when X, Y are disjoint, then MI is 0 and when X, Y overlap completely, then the MI score will be high. So, I'm thinking that a high MI value is "bad" since this means X, Y would be redundant. I am not sure if my understanding here is correct. Another method I have tried is to binarize the data for each feature (represented as rows in my dataset) using "present" (1) and "absent" (0). The main issue I have run into doing this is that I am trying to then create a distribution to compare the fea…  ( 10 min )
    [D] Best interface to use LLMs for code: Chat or completion?
    Hi everyone, I am quite interested in understanding what are the feedback from the community in terms of interface to leverage LLMs for code productivity. Because LLMs tend to do mistake I have mostly used Chat-like interfaces, like ChatGPT, as they allow to interact with the model and converge to a conclusion. I haven't used Copilot for a while but my feeling was that it could do some boilerplate correctly but then it quickly started suggesting code that would be misleading and could actually hurt productivity. It might have changed since then but that was my feeling back then. What is your favorite option and why? View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [D] ML input data has to be derived from a larger dataset
    Hello everyone. I am curious to know if anyone has encountered a ML problem like this and if so, I seek your advice. Usually in ML classification such as the IRIS dataset, each row represents a sample and each column a parameter, right ! My problem is that my ML classification parameters have to be derived from a range of values (parent data). I have taken mean of the parent values to generate the parameters for the ML input data. This results in lower classification accuracies using Random forest and XGBoost. Has anyone encountered a similar situation like this where the data has to be generated from a range of other datasets? Is there any other way to do this? I did not find any papers or articles from the web so just asking. I can generate additional parameters from other statistics such as median, standard deviation etc. which can improve the classification accuracy but can make interpretation of the results a little weird, domain wise. I wish to avoid this if possible. submitted by /u/notmyfault7676 [link] [comments]  ( 9 min )
    [D] Book review for Meta's ML Design interview? Machine Learning System Design Interview (by Ali Aminian and Alex Xu)
    I'm preparing for the ML system design interview for Meta, and I searched for various resources. This book (ML System Design Interview (by Ali Aminian & Alex Xu)) seems like a solid structured resource that covers solutions to case studies in detail. Has anyone used it to prepare for Meta's ML System Design interview? Thoughts? Khang's book doesn't seem to have great reviews. Chip Huyen's book (Designing Machine Learning Systems: An Iterative Process for Production-Ready Applications) doesn't seem very focused on interview prep?? Also, happy to hear about other cool resources to prepare. Thanks very much! submitted by /u/irEFrienfk [link] [comments]  ( 9 min )
    [R] Open X-Embodiment: Robotic Learning Datasets and RT-X Models - DeepMind 2023 - RT-X exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms!
    Blog: https://www.deepmind.com/blog/scaling-up-learning-across-many-different-robot-types https://robotics-transformer-x.github.io/ here you can also find the Datasets and Code! Paper: https://robotics-transformer-x.github.io/paper.pdf Abstract: Large, high-capacity models trained on diverse datasets have shown remarkable successes on efficiently tackling downstream applications. In domains from NLP to Computer Vision, this has led to a consolidation of pretrained models, with general pretrained backbones serving as a starting point for many applications. Can such a consolidation happen in robotics? Conventionally, robotic learning methods train a separate model for every application, every robot, and even every environment. Can we instead train “generalist” X-robot policy that can be adapted efficiently to new robots, tasks, and environments? In this paper, we provide datasets in standardized data formats and models to make it possible to explore this possibility in the context of robotic manipulation, alongside experimental results that provide an example of effective X-robot policies. We assemble a dataset from 22 different robots collected through a collaboration between 21 institutions, demonstrating 527 skills (160266 tasks). We show that a high-capacity model trained on this data, which we call RT-X, exhibits positive transfer and improves the capabilities of multiple robots by leveraging experience from other platforms. https://preview.redd.it/oxzutrhtb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=37b8b1dbf5f489dc2c8eaca4d15cb9c32ebc2660 https://preview.redd.it/ldsiwshtb1sb1.jpg?width=1494&format=pjpg&auto=webp&s=fdbf0f91c705acf11bff854f6d6af82dddd47021 https://preview.redd.it/ikk18jitb1sb1.jpg?width=1693&format=pjpg&auto=webp&s=e50b443dc4b0266a0480d54c4f92a0b708485797 https://preview.redd.it/t5wmciitb1sb1.jpg?width=1361&format=pjpg&auto=webp&s=2971fd645acb6dcbed2ca3522e311d0772c45964 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Biggest problems with ML in industry?
    For all my corporate ML engineers I have a question, what are the most annoying / biggest problems you face when developing/deploying ML in industry? This can be anywhere from data, to tuning, to even MLOPS. submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [D] Difficulty with paper implementations on google colab
    I am not from CS background, my knowledge is from online courses and books. All of which used some variation of Jupyter notebook. My knowledge of code can be lacking sometimes, since I am not from CS background. I am trying to implement some computer vision paper codes on newer samples. I understand the papers, and the underlying mechanisms. However, I fail to decipher the codes provided with the associated github repository. Usually, these repository contains information on how to recreate the experiment on some specific data using shell. But I am using google Colab for this purpose, as I don't have access to GPU, and I found it impossible to recreate the experiments in the google Colab, using shell commands, let alone extend it to newer samples. I would appreciate some help in this regard, I haven't done this before, and there aren't really any tutorial/resource on how to do this. Ideally, what I am trying to do is separate the model, input some images, get the output, and interpret it. I am stuck, and I would really appreciate some help or advice in this regard. Right now I am trying to work with this paper, meta ood I would appreciate any help/advice/resource anything. I feel very lost. Thanks in Advance. submitted by /u/franticpizzaeater [link] [comments]  ( 9 min )
    Repurposing a personal desktop computer [P]
    Hello! I'm debating turning my old desktop (old CPU but relatively new GPU 3980 or 90) into a ML box that I can remote into. I'm sure people here have done something similar and I was wondering if anyone could point me towards some resources for getting it off the ground/any pitfalls to avoid/suggestions. I'm an active data scientist researcher for my job and this would just be for fun side projects but I have some pretty glaring holes in my knowledge of computers (like the best way to set this up - should I uninstall windows install unbuntu or is windows fine?) Honestly I'm sure my ignorance will be pretty apparent from the questions I'm asking/not asking so any advice anyone has would be welcome! Thanks! Sorry if this is the wrong subreddit for this sort of thing. ​ submitted by /u/shebaiscool [link] [comments]  ( 9 min )
    [R] Generative memory: generative diffusion models are equivalent to modern Hopfield nets
    https://arxiv.org/abs/2309.17290 submitted by /u/LucaAmbrogioni [link] [comments]  ( 8 min )
    [D] Stuck in Automation of AI models
    Hello everyone! ​ I'm currently working on a project and have hit a roadblock in automating the deployment of my machine-learning models. Can anyone provide guidance on the best practices or tools for streamlining the deployment process? Specifically, I'm looking to create a seamless workflow where models can be easily uploaded, deployed on the cloud, and accessible through APIs. Any insights or advice would be greatly appreciated! ​ Automation!!! submitted by /u/homelander81 [link] [comments]  ( 9 min )
    [P] The Case of the Missing Masterpiece
    Hi, I just wanted to share an applied image classification problem that I worked on a few years ago: https://vdalv.github.io/2018/09/01/missingMasterpiece.html submitted by /u/vdalv [link] [comments]  ( 9 min )
    Need to build a XAI model to explain the behaviour of an IDS [P]
    Hello, I need help from someone that knows about XAI. I have to create a XAI model to intérprete the resulta of an AI model, an MLP, that works as an IDS classifier. I have no idea on how to do It and I have been completely blocked for 2.5 years. This is the final project of my career and I just don't know how to do It, and my tutor isn't very helpful. If anyone is able to help I would explain him what I have to do and would be very grateful. Thanks for your help submitted by /u/elMandarine [link] [comments]  ( 9 min )
    [D] Optimal scheduling tool with AI/ML recommendations
    Hello all, I'm trying to plan out for a new web platform development for workforce management but have little experience. We all know that hard coding can be done for general scheduling, including manager polling shifts based on labor category, staff assignments, conflt resolving, emergency scheduling, etc. But what I want to research to is....how can I ensure that one optimal schedule is automatically computed using AI/machine learning tools so that I don't have to go through the list of hard-coded generated schedules (I’m sure these will work fine, but still want to compute one ultimate schedule). submitted by /u/Playful-Bed-2183 [link] [comments]  ( 9 min )
    [R] Break-A-Scene: Extracting Multiple Concepts from a Single Image
    ​ Break-A-Scene: Given a single image with multiple concepts, annotated by loose segmentation masks, our method can learn a distinct token for each concept, and use natural language guidance to re-synthesize the individual concepts or combinations of them in various contexts. Project Page: https://omriavrahami.com/break-a-scene/ Code is publicly released! Abstract Text-to-image model personalization aims to introduce a user-provided concept to the model, allowing its synthesis in diverse contexts. However, current methods primarily focus on the case of learning a single concept from multiple images with variations in backgrounds and poses, and struggle when adapted to a different scenario. In this work, we introduce the task of textual scene decomposition: given a single image of a scene that may contain several concepts, we aim to extract a distinct text token for each concept, enabling fine-grained control over the generated scenes. To this end, we propose augmenting the input image with masks that indicate the presence of target concepts. These masks can be provided by the user or generated automatically by a pre-trained segmentation model. We then present a novel two-phase customization process that optimizes a set of dedicated textual embeddings (handles), as well as the model weights, striking a delicate balance between accurately capturing the concepts and avoiding overfitting. We employ a masked diffusion loss to enable handles to generate their assigned concepts, complemented by a novel loss on cross-attention maps to prevent entanglement. We also introduce union-sampling, a training strategy aimed to improve the ability of combining multiple concepts in generated images. We use several automatic metrics to quantitatively compare our method against several baselines, and further affirm the results using a user study. Finally, we showcase several applications of our method. ​ submitted by /u/sgd_is_all_you_need [link] [comments]  ( 9 min )
    [R] MIT, Meta, CMU Researchers: LLMs trained with a finite attention window can be extended to infinite sequence lengths without any fine-tuning
    LLMs like GPT-3 struggle in streaming uses like chatbots because their performance tanks on long texts exceeding their training length. I checked out a new paper investigating why windowed attention fails for this. By visualizing the attention maps, the researchers noticed LLMs heavily attend initial tokens as "attention sinks" even if meaningless. This anchors the distribution. They realized evicting these sink tokens causes the attention scores to get warped, destabilizing predictions. Their proposed "StreamingLLM" method simply caches a few initial sink tokens plus recent ones. This tweaks LLMs to handle crazy long texts. Models tuned with StreamingLLM smoothly processed sequences with millions of tokens, and were up to 22x faster than other approaches. Even cooler - adding a special "[Sink Token]" during pre-training further improved streaming ability. The model just used that single token as the anchor. I think the abstract says it best: We introduce StreamingLLM, an efficient framework that enables LLMs trained with a finite length attention window to generalize to infinite sequence length without any fine-tuning. We show that StreamingLLM can enable Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with up to 4 million tokens and more. TLDR: LLMs break on long convos. Researchers found they cling to initial tokens as attention sinks. Caching those tokens lets LLMs chat infinitely. Full summary here Paper link: https://arxiv.org/pdf/2309.17453.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Really good dataset for a Course Capstone
    Hey everyone! My friends and I are taking a Data Science course in our university. We are modestly versed in ML/DL techniques, and want to use everything we know on a really good capstone project for this course. We are looking for a dataset where we can demonstrate a nice variety of techniques to really blow the socks off our Professor. Ideally we'd like this to be stemming from something basic that most would consider "Data Science", as in something with a tabular dataset and elements of classification. Though we still want chances to bring in what we know from outside the course: for example, if there's images to supplement the dataset we could use Image Classification models or something multimodal to bring in more features, if there's natural language data then we could use LLMs to extract salient features etc. More importantly though, we want something whose exploration can be really motivated so it doesn't seem we're only in it for the ML aspect. Thank you! submitted by /u/Subject-Revolution-3 [link] [comments]  ( 9 min )
    [D] Competitiveness in ML research
    I've been diving deep into the world of machine learning research, and I'm genuinely baffled: how on Earth do some researchers seem to pump out paper after paper? I mean, there's only 24 hours in a day, right? Are academic minions (i.e. PhD students) doing all the heavy lifting? Or maybe some highly efficient workflows I'm not privy to? On a more serious note, I would like a career in ML, and the sheer volume and pace of these publications is making me feel a bit disheartened. How is this prolificity possible? Any words of encouragement or advice? submitted by /u/blabboy [link] [comments]  ( 9 min )
    [D] Why should I use a hosted/cloud VectorDB solutions over a serverless or vector store?
    Why the hell should i use cloud based or server hosted solution over a easy peasy servless variant like lancedb or even faiss vector store is enough for most of the use cases on small-medium I often see posts like "oh my stack is... pinecone Chroma weaviate_io" And they just ingest minisets of data, what the hell man submitted by /u/Dear_Bullfrog193 [link] [comments]  ( 9 min )
    [P] FontoGen: generating true-type fonts
    I'd like to share a project that I've spent a few weekends working on. FontoGen is an autoregressive encoder-only transformer model that's capable of generating true-type fonts. GitHub: https://github.com/SerCeMan/fontogen Weights: https://huggingface.co/SerCe/fontogen Blog post with more details: https://serce.me/posts/02-10-2023-hey-computer-make-me-a-font The project is largely an exploration of whether generating fonts natively, line by line, is possible. I'm not aware of any previous research that would achieve the same results for complete fonts previously. This is my first ML-specific project, and I would appreciate any feedback on the model architecture, and I'm also happy to answer any questions you may have. submitted by /u/SerCeMan [link] [comments]  ( 9 min )
    [D] What happens after removing the causal mask of LLaMA?
    The causal mask in LLaMA serves as a protective barrier to prevent information leakage. However, in certain tasks, leveraging information leakage can be a beneficial strategy for enhancing performance, particularly in tasks like token classification, such as Named Entity Recognition (NER). Interestingly, the paper titled "Label Supervised LLaMA Finetuning" (available at https://arxiv.org/abs/2310.01208) reveals a significant performance boost in token classification when the causal mask is removed. submitted by /u/seanlee97 [link] [comments]  ( 9 min )
    [R] RA-DIT: Retrieval-Augmented Dual Instruction Tuning
    New paper that proposes instruction-tuning with in-context retrieval-augmentation to improve SOTA LLMs in cases where access to large, external knowledge sources is needed. Tested on LLaMA 65B, 13B and 7B. https://arxiv.org/abs/2310.01352 submitted by /u/todpole3 [link] [comments]  ( 9 min )
    [D] How do you scale computational intensive Python scripts?
    Hey ML Community, I'm wondering how people currently go about scaling their Python programs? Lets say for instances you're doing batch inference using an LLM. Each prediction takes 2-3 minutes to process, how would you go about scaling that to make a million predictions? I'm asking this question because a few months back I started building a tool to quickly parallelize python functions across thousands of machines in the cloud. I'm focused on making the barrier to interact with the cloud extremely low and want to know all the core alternatives out there. Also, if you have any advice on starting a business I'd love to hear it. submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [D] What is the highest quality automatic image captioning solution?
    I make very high quality Lora's and finetuned stable diffusion models. These models yield very good results, but more importantly they are very easy to use as I have always captioned my images as one would use natural spoken language (no weird booru tags and all that jazz). The most labor intensive processes in the workflow is image captioning. For example, my last project had almost 10000 images in the data set. Every single image was manually captioned by me as the quality of all automated solutions I tried is subpar and has too many accuracy issues. I have tried Blip auto captioning and LLava, but they still were not accurate enough for what I needed. I am hoping someone here can suggest a solution, if one exists, thanks. submitted by /u/no_witty_username [link] [comments]  ( 9 min )
    [D] (Interview Help) Do you know any good resources for interview case studies in the finance domain (especially dealing in loan and credit cards)
    I'm preparing for a data science interview and am looking for case study prep resources, especially for the financial domain (loans and credit cards). Mainly, I want to understand some good metrics for the financial domain, ways to break down the questions and create a rough data model, kinds of conditions to take into consideration (eg. Seasonality), kinds of effects that can be used expected (like opportunities and risks), etc. Any resources or help is greatly appreciated! submitted by /u/how_the_turn_tablez [link] [comments]  ( 9 min )
  • Open

    Help Restricting Actions
    Hello, I am new to RL, I am currently working on a school project that requires it. I am working on making a model to play a game very similar to wordle, so for the function of this post it may as well be wordle. Right now I am trying to get it to work with this gym https://github.com/zach-lawless/gym-wordle, and I will make my tweaks later. This gym has a multi discrete action space, which makes sense to me for a word, IDK if thats best. To validate words, it has its own exception type. I am trying to train this with stable_baselines3, but the exception keeps being raised, since it is trying to guess garbled words like "xcjhr". Is there a way I can validate actions before they are made so the model is restricted to only guessing valid words? Is there a better way to do this? It doesnt need to be the best, it really only has to sorta work. Any help is appreciated, thanks! submitted by /u/ClackHack [link] [comments]  ( 9 min )
    Looking For Advice on Training and Reward Functions
    Hi Everyone, I'm venturing into a new territory of Reinforcement Learning (RL) through a personal project, despite having a solid background in various other ML domains. I'm developing an RL agent to play Skyjo, a turn-based card game, and I'm encountering some challenges related to reward optimization and game-ending decisions by the agents. I'd appreciate any advice or insights you might have! Project Overview: Objective: Develop an RL model to play Skyjo competitively. Environment: Built using Gymnasium and Pytorch. Agents: Two agents working in tandem - one for card selection (discard/draw) and the other for action and location selection. Training: 4-8 agent instances play against each other. Repository: https://github.com/grantslewis/auto_skyjo Reward Structure: Small p…  ( 10 min )
    My frustration level with Torch/Keras/Tensorflow and DQNs is killing me
    RANT: I've tried every possible example I can get my hands on. I've looked at reference examples. I've looked at Medium articles. I've looked at stuff written by college freshmen. Every example I find for a DQN written either for torch or tensorflow (and either tf_agents or keras), seems to either have a nasty bug preventing it to work or such a severe memory leak that it is unusable. I tried Torch recently and was doing some simple gridworlds. It does fine for tiny gridworlds like 5x5. I decided to push it a little (not much at all) to a known 21x21 gridworld from recognized papers - reference example died and ran out of memory after 3000 episodes - I mean - really? 3000 episodes? I ran on CPU and gave it 64GB. I don't know how much memory this SHOULD take. I can do it in a Q-Table for…  ( 10 min )
    Advice to improve outcome on a turn-based strategy game
    Hello everyone, I'm a total beginner in the reinforcement learning (RL) community, and I would appreciate some advice on a problem I'm currently facing. I've created a simple 2D turn-based game with only movement at the moment (I will also add combat features when I have success with training an AI for the movements). Game The rules are simple : A grid of 14x40 (560 cells in total) 1 Agent with a limited number of Move Point (MP) 1 Target that does not move (atm) The agent can end its turn to get its MP back I already implemented a pathfinding algorithm using A* which works really well but I would like to train an AI to reach the target as fast as possible (turn-wise). Here is a simulation of a state : ​ https://preview.redd.it/0p5yijnb60sb1.png?width=442&format=png&auto=…  ( 10 min )
    Cleanba, our new distributed DRL platform is finally out 🤗
    submitted by /u/vwxyzjn [link] [comments]  ( 8 min )
  • Open

    DSC Weekly 3 October 2023
    Announcements Top Stories In-Depth The post DSC Weekly 3 October 2023 appeared first on Data Science Central.  ( 20 min )
    Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think?
    One of the most impressive generative AI applications I have seen is viperGPT. The image / site explains it best. The steps are: This example, earlier this year, showed the potential of multimodal LLMs And as of last week, that future is upon us ChatGPT can now see, hear & speak. What are the implications… Read More »Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? The post Generative AI Megatrends: ChatGPT can see, hear and speak – but what does it mean when ChatGPT can think? appeared first on Data Science Central.  ( 20 min )
    Cracking the code: The rising demand for data scientists in various industries
    In the ever-evolving landscape of the digital era, the relentless quest for deriving actionable insights from a sea of information has become the cornerstone of innovation and strategy. As businesses and organizations strive to navigate the complex corridors of big data, the spotlight invariably falls upon the expertise of data scientists, the modern-day architects of… Read More »Cracking the code: The rising demand for data scientists in various industries The post Cracking the code: The rising demand for data scientists in various industries appeared first on Data Science Central.  ( 21 min )
    Generative AI megatrends: How many LLMs would you subscribe to?
    I recently subscribed to openAI GPT4 for the OpenAI Code Interpreter/Advanced data analytics. We are using it in our class at the University of Oxford.  Its really cool and we are also waiting the multimodal openAI features Recently, a well known AI critic said that he does not see how Generative AI companies could be… Read More »Generative AI megatrends: How many LLMs would you subscribe to? The post Generative AI megatrends: How many LLMs would you subscribe to? appeared first on Data Science Central.  ( 19 min )
    A few highlights of the Efficient Generative AI Summit (EGAIS)
    Large language models (LLMs) for generating text and vision models for generating images are notoriously inefficient. The larger they get, the more power hungry they become.   Kisaco Research in September hosted a one-day event in Santa Clara dedicated to the topic of generative artificial intelligence (GAI) efficiency, followed by a three-day Summit on Hardware and… Read More »A few highlights of the Efficient Generative AI Summit (EGAIS) The post A few highlights of the Efficient Generative AI Summit (EGAIS) appeared first on Data Science Central.  ( 21 min )
  • Open

    AI copilot enhances human precision for safer aviation
    Designed to ensure safer skies, “Air-Guardian” blends human intuition with machine precision, creating a more symbiotic relationship between pilot and aircraft.  ( 8 min )
  • Open

    Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI
    A diverse research ecosystem is essential to realizing the promise of AI. Accelerate Foundation Models Research aims to expand access to powerful models, engaging academics outside of computer science to pursue a broad range of important opportunities. The post Accelerate Foundation Models Research: Supporting a global academic research ecosystem for AI appeared first on Microsoft Research.  ( 10 min )
  • Open

    Meet the Maker: Robotics Student Rolls Out Autonomous Wheelchair With NVIDIA Jetson
    With the help of AI, robots, tractors and baby strollers — even skate parks — are becoming autonomous. One developer, Kabilan KB, is bringing autonomous-navigation capabilities to wheelchairs, which could help improve mobility for people with disabilities. The undergraduate from the Karunya Institute of Technology and Sciences in Coimbatore, India, is powering his autonomous wheelchair Read article >  ( 6 min )
    CG Geek Makes VFX Look Easy This Week ‘In the NVIDIA Studio’
    Releasing a 3D tutorial dubbed The Easiest VFX Tutorial Ever takes supreme confidence and the skills to back it up. Steve Lund a.k.a. CG Geek — the featured artist of this week’s In the NVIDIA Studio installment — has both in spades.  ( 8 min )
  • Open

    From graph theory to category theory
    Let G be a directed graph whose nodes are the positive integers and whose edges represent relations between two integers. In our first example we’ll draw an edge from x to y if x is a multiple of y. In our second example we’ll draw an edge from x to y if x ≥ y. […] From graph theory to category theory first appeared on John D. Cook.  ( 6 min )
    Test functions
    Test functions are how you can make sense of functions that aren’t really functions. The canonical example is the Dirac delta “function” that is infinite at the origin, zero everywhere else, and integrates to 1. That description is contradictory: a function that is 0 almost everywhere integrates to 0, even if you work in extended […] Test functions first appeared on John D. Cook.  ( 6 min )
    Groups vs Abelian groups: Pedantic or profound?
    This article will probably only be of interest to a small number of readers. Those unfamiliar with category theory may find it bewildering, and those well versed in category theory may find it trivial. My hope is that someone in between, someone just starting to get a handle on category theory, will find it helpful. […] Groups vs Abelian groups: Pedantic or profound? first appeared on John D. Cook.  ( 7 min )
  • Open

    DALL·E 3 system card
    No content preview  ( 1 min )

  • Open

    [Discussion] I didn't do well in Calculus III
    So I got an A in calculus three but I probably didn't deserve it since it was online and all I did was look up the answer and understand the problems given on the test. So I probably have a C level understanding. Will I be tested on calc 3 knowledge in machine learning or should I retake calc 3? submitted by /u/Glittering-Target-87 [link] [comments]  ( 9 min )
    [P] Hand keypoint detection
    Hello Reddit, I have a question regarding the right tool. I'm looking for a tool / model to detect hand-keypoints in a video stream of a person assembling stuff. I know OpenPose is a possible one, also Google MediaPipe. I’m not really getting along with OpenPose and MediaPipe don’t show really good results. In my project, I would like to detect hand keypoints in assembly scenarios. It would be ok to use 2 cameras or a depth camera if necessary. Does anybody knows any models / tools to use? Thanks in advance :) submitted by /u/VGHMD [link] [comments]  ( 9 min )
    [P] Best option for a large, local embedding database?
    Langchain offers a wide array of vector databases for text embedding models. I need to create a vector database for around 3 million sentence embeddings, each being of dimension 384. I'm building a prototype, so it has to be local and free of charge to use. So far, I've hit limits for Chroma (41,666 max). I've also tried Redis, QDrant and FAISS - each of these gets so large that it eats up all the RAM and the process gets killed, or with QDrant, just errors out. I've used Pinecone before, but I don't really want to pay for a prototype as I have plenty of disk space. I was thinking of chunking the 3 million documents into local vector stores of size 41,666 using ChromaDB - but there isn't a whole lot out there about whether Chroma would allow me to merge all ~70 of these smaller databases into a bigger one for search. I also cannot find whether it would be possible to load all 70 of these into memory and search each one individually. So what are my options? My other thought was just creating a large Doc2Vec model, however I would like to use something more sophisticated like Huggingface embedding models. submitted by /u/russ_fegoli [link] [comments]  ( 9 min )
    [D] Proof of convergence for a heavy-ball adaptive step-size algorithm for non-convex functions
    Hello everyone, I am struggling with prooving convergence for an optimizer which uses adaptive step-size with heavy ball algorithm for convex and non-convex functions. In some literature, I could find a regret bound analysis/proof for convex functions and proving that the estimated gradient at t -> inf goes to zero for non-convex functions. There are some assumptions and preconditions: The algorithm is heavy ball momentum with adaptive step-size. ' X_(k+1) = X_k - \eta_k . \nabla(f(x_k)) + \beta(x_k - x_(k-1)) The following assumptions are made: A. The function is smooth. B. The function is Lipschitz. C. The gradients are Lipschitz. I attempt to prove the convergence to a critical point or a local minima. Where the estimate of the gradients at any instance k goes to zero. i.e. E[\nabla(f(x_k))] = 0 as t -> inf. Could anyone please guide me through the process of convergence proof for non-convex functions or give me literature recommendations for the same. Thank you very much in advance. submitted by /u/Loose_Foundation5990 [link] [comments]  ( 9 min )
    [D] open problems after GPT4 capabilities
    We all know that LLMs (and especially foundation models) are extremely functionally capable. Has anyone made a nice list of deficiencies that they show? I know Gary Marcus did so many years ago, but after GPT3 and GPT4 -- what is still unsolved? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    [D] Hoeffdings inequality, does it make sense practically?
    According to it, increasing the hypotheses set loosens the upper bound between in-sample and out-of-sample error. ​ Can't we subdivide the hypotheses set to multiple ones, ensuring tighter bounds in general? ​ and generally, have you seen it in use before? I have seen a lot of ML projects without anybody mentioning it or anything theoretical. submitted by /u/2azo [link] [comments]  ( 9 min )
    [P] Good models to use for multimodal object detection when both the modalities are image based or some models which support ensembling?
    So basically I have a dataset with images of vehicles in top down view in both RGB and IR, what are some models I can use for both unimodal and multimodal object detection to compare their performance. Links to GitHub repos would be helpful. Thanks submitted by /u/Xyber5 [link] [comments]  ( 9 min )
    Benefits of converting DICOM images to PNG's [P]
    I try to understand what are the benefits to convert DICOM images to PNG's. Context: I have DICOM images which I already extracted the useful meta-data I want to use. Those images are for a task, classification-detection pipeline of some disease. So as I already asked, what are the benefits of converting those DICOM files to PNG's rather then just using pydicom and the dicom pixel_array? Reason I ask this is because I saw many top 5 users on kaggle do this when dealing with DICOM images. If I understand how networks actually works, they get as input an array of pixels as floating point numbers no? So what's the differences between DICOM pixel_array to PNG's pixel array and numpy array or tensor? both are eventually will be fed to the network as a tensor of floating numbers. Is the reason is because PNG's are usually faster to train? Is the reason is because PNG's have more libraries support for preprocessing / augmentation / etc. ? Is the reason is because PNG's are the format many pre-trained models expect to? (I write this knowing it's 99% not true, as mentioned the tensor thing) Thanks in Advance, and Please, forgive my English (I could use AI tools to fix it but I feel addicted already) submitted by /u/01jasper [link] [comments]  ( 9 min )
    [D] What kind of distribution is this?
    Hey guys, I am wondering what kind of distribution my data are following? I want to fit a distribution function to them and use this fitted distribution function to generate new samples with a given mean and standard deviation (python). Any tips for this? Happy to hear your suggestions :) https://preview.redd.it/kdcftvpq8urb1.png?width=408&format=png&auto=webp&s=6163b9f571069e098c9e9a609c3d1cb9910fe1fb submitted by /u/Tigmib [link] [comments]  ( 9 min )
    [R] Efficient Streaming Language Models with Attention Sinks - Meta AI 2023 - StreamingLLM enables Llama-2, Falcon and Pythia to have an infinite context length without any fine-tuning! Allows streaming use of LLMs!
    Paper: https://arxiv.org/abs/2309.17453 Github: https://github.com/mit-han-lab/streaming-llm Abstract: Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly, during the decoding stage, caching previous tokens' Key and Value states (KV) consumes extensive memory. Secondly, popular LLMs cannot generalize to longer texts than the training sequence length. Window attention, where only the most recent KVs are cached, is a natural approach -- but we show that it fails when the text length surpasses the cache size. We observe an interesting phenomenon, namely attention sink, that keeping the KV of initial tokens will largely recover the performance of wind…  ( 9 min )
    [Project] I just released an open-source package, TorchLens, that can extract the activations/metadata from any PyTorch model, and visualize its structure, in just one line of code. I hope it helps you out!
    You just give it any PyTorch model (as-is, no changes needed), and it spits out a data structure with the activations of any layer you want, along with a bunch of metadata about the model and each layer and an optional automatic visualization of the model's computational graph. I hope this greatly speeds up the process of extracting features from models for further analysis, and also serves as an aid in quickly understanding new models. I also hope it'd be helpful for teaching purposes, too. It is meant to work for any PyTorch model whatsoever and I've tested it on hundreds of models (see the "model menagerie" of visualizations below), though it's always possible I've missed some edge case or another. Hope it helps you out--I'm still actively developing it, so let me know if there's anything on your wishlist! https://preview.redd.it/k37nhejvxtrb1.png?width=640&format=png&auto=webp&s=5713a8711110644794e2264d84dd479ede861c5e GitHub Repo Twitter Thread Paper CoLab Tutorial Gallery of Model Visuals submitted by /u/therealjmt91 [link] [comments]  ( 9 min )
    [D] Why Vision Tranformers?
    Transformers have been the new kid on the block, easy to see why with LLMs and and sequential output generation, but I still don't know why vision transformers based on ViT are so hot in the field right now. From my understanding, CNNs are just vastly better than transformers for vision tasks, as its inductive biases allows it to determine the relationship between neighboring features of an image via pooling and filters. However, transformers don't have this kind of inductive bias, and as a result, take much more data and compute to reach similar levels of performance. I read this survey paper on Vision Transformers here: https://arxiv.org/pdf/2012.12556.pdf, which has the performance of CNNs vs various transformer models for CV. Comparing even the best vision transformers to the classic …  ( 10 min )
    [R] Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [P] Awesome AI developer productivity Github repo
    Hello everyone, We've begun gathering a variety of AI coding tools used in one place to make things easier for everyone. We're inviting everyone to check out our collection, and maybe even add tools you find useful. You can find the repository here: https://github.com/gaborsoter/awesome-ai-dev-productivity Feel free to explore and contribute! submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [R] On the Biometric Capacity of Generative Face Models
    We developed a statistical model to estimate “How many unique identities can a generative face model generate?” without exhaustively generating a lot of faces. Abstract: There has been tremendous progress in generating realistic faces with high fidelity over the past few years. Despite this progress, a crucial question remains unanswered: “Given a generative face model, how many unique identities can it generate?” In other words, what is the biometric capacity of the generative face model? A scientific basis for answering this question will benefit evaluating and comparing different generative face models and establish an upper bound on their scalability. This paper proposes a statistical approach to estimate the biometric capacity of generated face images in a hyperspherical feature space. We employ our approach on multiple generative models, including unconditional generators like StyleGAN, Latent Diffusion Model, and “Generated Photos,” as well as DCFace, a class-conditional generator. We also estimate capacity w.r.t. demographic attributes such as gender and age. Our capacity estimates indicate that (a) under ArcFace representation at a false acceptance rate (FAR) of 0.1%, StyleGAN3 and DCFace have a capacity upper bound of 1.43 million and 11,900, respectively; (b) the capacity reduces drastically as we lower the desired FAR with an estimate of 17,960 and 562 at FAR of 1% and 10%, respectively, for StyleGAN3; (c) there is no discernible disparity in the capacity w.r.t gender; and (d) for some generative models, there is an appreciable disparity in the capacity w.r.t age. Paper: https://arxiv.org/abs/arXiv:2308.02065 Code: https://github.com/human-analysis/capacity-generative-face-models submitted by /u/VishDev [link] [comments]  ( 9 min )
    [P] Comgra: A library for debugging and understanding neural networks
    I'm a machine learning engineer and researcher. I got fed up with how difficult it is to understand why neural networks behave the way they do, so i wrote a library to help with it. Comgra (computation graph analysis) is a library you can use with pytorch to extract all the tensor data you care about and visualize it graphically in a browser. This allows for a much more detailed analysis of what is happening than the usual approach of using tensorboard. You can go investigate tensors as training proceeds, drill down into individual neurons, inspect single data sets that are of special interest to you, track gradients, compare statistics between different training runs, and more. This tool has saved me a ton of time in my research by letting me check my hypotheses much more quickly than normal and by helping me understand how the different parts of my network really interact. I first published this a month ago and have made some improvements since then. I would be happy to hear even more feedback! My goal is to make this the go-to library used both by novices who want to understand what's going on under the hood, and by researchers in neural architecture design. submitted by /u/Smart-Emu5581 [link] [comments]  ( 9 min )
    [D] The most complete Audio ML toolkit 🚀
    Hugging Face Transformers is a complete audio toolkit that provides state-of-the-art models for all audio tasks, including TTS, ASR, audio embeddings, audio classification and music generation. All you need to do is install the Transformers package: pip install --upgrade transformers And then all of these models can be used in just 3 lines of code: ​ TTS Example usage: from transformers import pipeline generator = pipeline("text-to-speech", model="suno/bark-small") speech = generator("Hey - it's Hugging Face on the phone!") Available models: Bark https://huggingface.co/suno/bark MMS TTS https://huggingface.co/facebook/mms-tts-eng VITS https://huggingface.co/kakao-enterprise/vits-vctk SpeechT5 https://huggingface.co/microsoft/speecht5_tts And more! https://huggingface.co/mo…  ( 9 min )
    [R] The Dawn of LMMs: Preliminary Explorations with GPT-4V(ision) - Microsoft 2023 - 166 Pages!
    Paper: https://arxiv.org/abs/2309.17421 Youtube: https://youtu.be/Q0pP782dSh0?si=MiJAlK5k-KEyQ-Zr Abstract: Large multimodal models (LMMs) extend large language models (LLMs) with multi-sensory skills, such as visual understanding, to achieve stronger generic intelligence. In this paper, we analyze the latest model, GPT-4V(ision), to deepen the understanding of LMMs. The analysis focuses on the intriguing tasks that GPT-4V can perform, containing test samples to probe the quality and genericity of GPT-4V's capabilities, its supported inputs and working modes, and the effective ways to prompt the model. In our approach to exploring GPT-4V, we curate and organize a collection of carefully designed qualitative samples spanning a variety of domains and tasks. Observations from these samples demonstrate that GPT-4V's unprecedented ability in processing arbitrarily interleaved multimodal inputs and the genericity of its capabilities together make GPT-4V a powerful multimodal generalist system. Furthermore, GPT-4V's unique capability of understanding visual markers drawn on input images can give rise to new human-computer interaction methods such as visual referring prompting. We conclude the report with in-depth discussions on the emerging application scenarios and the future research directions for GPT-4V-based systems. We hope that this preliminary exploration will inspire future research on the next-generation multimodal task formulation, new ways to exploit and enhance LMMs to solve real-world problems, and gaining better understanding of multimodal foundation models. https://preview.redd.it/qkytzg2rjqrb1.jpg?width=511&format=pjpg&auto=webp&s=fc306dc6ae64100e993639f8e27583b809bf8a5c https://preview.redd.it/z4kq0l2rjqrb1.jpg?width=507&format=pjpg&auto=webp&s=d4fda59456846fa7a6c9b318b21fc9c544bd2b68 https://preview.redd.it/1ptrkk2rjqrb1.jpg?width=712&format=pjpg&auto=webp&s=2b44fbc949e76fdf20d05b1236f56c87ba5efece ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] NanoPhi, Implementing some of the success of Phi-1.5, with GPT-2(124m)
    Hi, i'm trying to replicate at least some of the success of Phi 1.5 on a model 10x smaller, gpt-2 124m. I have started with model finetuning, and have a simple github with roadmap, https://github.com/VatsaDev/NanoPhi, check it out there! submitted by /u/vatsadev [link] [comments]  ( 9 min )
  • Open

    Code Llama code generation models from Meta are now available via Amazon SageMaker JumpStart
    Today, we are excited to announce Code Llama foundation models, developed by Meta, are available for customers through Amazon SageMaker JumpStart to deploy with one click for running inference. Code Llama is a state-of-the-art large language model (LLM) capable of generating code and natural language about code from both code and natural language prompts. Code […]  ( 11 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 1
    A successful deployment of a machine learning (ML) model in a production environment heavily relies on an end-to-end ML pipeline. Although developing such a pipeline can be challenging, it becomes even more complex when dealing with an edge ML use case. Machine learning at the edge is a concept that brings the capability of running […]  ( 10 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 2
    In Part 1 of this series, we drafted an architecture for an end-to-end MLOps pipeline for a visual quality inspection use case at the edge. It is architected to automate the entire machine learning (ML) process, from data labeling to model training and deployment at the edge. The focus on managed and serverless services reduces […]  ( 9 min )
    Build an end-to-end MLOps pipeline for visual quality inspection at the edge – Part 3
    This is Part 3 of our series where we design and implement an MLOps pipeline for visual quality inspection at the edge. In this post, we focus on how to automate the edge deployment part of the end-to-end MLOps pipeline. We show you how to use AWS IoT Greengrass to manage model inference at the […]  ( 9 min )
  • Open

    [D] RL agenda after LLMs or S4?
    Many other students in my research institution are pretty worried after ChatGPT / LLMs about continuing work in RL and are thinking of leaving the field. What are main the open problems in RL after LLMs and S4 can solve a hefty chunk of sequence learning problems? submitted by /u/Cultural-Average3959 [link] [comments]  ( 9 min )
    RLHF without GAE
    If I already have a trained reward model, say a sentiment classification model, that I'd like to use for PPO-based RLHF, I believe the standard method would be to instantiate the Critic/value function using the reward model, and train it further during PPO, correct? Would it even make sense to try PPO for RLHF without using the GAE term and thus without the value function, and just directly using the reward model's output as the advantage? It seems that this would be require viewing the entire generation as a single action (rather than each token's generation as an action), but most of the articles I've read on RLHF seem to treat it that way. On the other hand, all the code implementations I've seen have an Actor-Critic model producing values at each token, which I think implies that each token is an action. Edit: Apologies if any of this is just me having fundamental gaps in my understanding! submitted by /u/ganzzahl [link] [comments]  ( 9 min )
    3-player graph pursuit game
    So I am trying to find NE using rl algorithms for a turn based deterministic graph pursuit game. I have a way of checking if the strategies of players 1,2,3 are a NE and thought of using Q-Learning and see if it converges to a NE. Thus far it doesnt seem to work and I wonder if I made a mistake. The state is described as: St = [x1 x2 x3 p] where current player is p and x1,x2,x3 are the locations of the players in the graph Players have value functions Q^1(St), Q^2(St), Q^3(St) The way I update my value function is: player i choose e-greedy action a_t and the new state St_new Q^i(St) = (1-alpha)*Q^i(St)+alpha*gamma*Q(St_new) I have tried using a memory buffer but I havent improve the convergence success. I check if the if the values are a NE every 1000 iterations. It only converges for simple graphs. Do you think the way I update my value function is correct? Do you have any other traditional algorithms to suggest? Shall I move to deep learning? I am worried if simple algorithms cant converge the neural networks wont either... I tried to implemenet Nash Q learning following the paper:https://www.jmlr.org/papers/volume4/hu03a/hu03a.pdf but I am not sure if implemented correctly for a turn based game submitted by /u/__gp_ [link] [comments]  ( 9 min )
  • Open

    Save 20 Hours A Week With This 1 Simple ChatGPT Prompt for Productivity
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    AI Anxiety’ Is on the Rise–Here’s How to Manage It
    Artificial intelligence (AI) anxiety is on the rise, but there are ways to manage it. While AI may outperform humans in certain tasks, humans are not yet headed for all-out replacement. Recent research shows that AI programs scored higher than humans in tasks requiring originality, but the highest-rated human ideas were still considered more creative. The rise of generative AI tools in industries like animation has left some professionals anxious about the future of their work. Experts suggest managing AI fears by understanding the historical context of technological advancements and focusing on the benefits and training opportunities that AI brings. Source : https://www.scientificamerican.com/article/ai-anxiety-is-on-the-rise-heres-how-to-manage-it/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tool-Integrated Reasoning: A New Approach for Math-Savvy LLMs
    When trying to get language models to solve complex math problems, researchers kept running into limits. Models like GPT-3 and ChatGPT still struggle with advanced algebra, calculus, and geometry questions. The math is just too abstract and symbol-heavy for them. To break through this barrier, researchers from Tsinghua University and Microsoft taught models to combine natural language reasoning with calling external math tools. The key is their new "tool-integrated reasoning" format. Models generate a natural language plan first, then write code to invoke tools like SymPy to solve equations. They take the output results and continue verbal reasoning. By interleaving natural language and symbolic computations, they get the best of both worlds - semantic understanding from language models and rigorous math from tools. They trained versions of the LLaMA model this way, producing their Tool-Integrated Reasoning Agent (TORA). They present some strong results: In evaluations on 10 math datasets, TORA substantially outperformed prior state-of-the-art methods, achieving 13-19% higher accuracy on average. On one competition test, TORA-7B scored 40% accuracy, beating the previous best model by 22 percentage points. This demonstrates that integrating tools directly into the reasoning process can significantly enhance mathematical capabilities, even for large models like GPT-4. However, tough problems involving geometry and advanced algebra are still there. New techniques for symbolic reasoning and spatial understanding will likely be needed to push further. Overall though, tool integration seems a promising path to improve reasoning skills. Applying this to other domains like logic and programming could also be impactful. TLDR: Teaching language models to use math tools helps them solve way more complex problems. Full Paper Summary arXiv Link submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Bard, what would it take to fast-track AI morality so that it surpasses human ability by 2025?
    To fast-track AI morality to surpass human ability by 2025, Google, OpenAI, Anthropic, and other developers could: Invest heavily in AI ethics research and development.** For example, Google AI has launched the Responsible AI Research and Development (RAIR) program, which is focused on developing new AI ethics research and tools. Create large-scale datasets of human moral reasoning.** For example, OpenAI has released the Moral Machine dataset, which contains over 1 million crowdsourced responses to moral dilemmas. Develop new AI architectures that are specifically designed for moral reasoning.** For example, Anthropic is developing a new AI architecture called "Moral Machine Learning," which is designed to learn from human moral reasoning data. Establish a global AI ethics research consortium and create a public database of human moral reasoning datasets.** For example, the Partnership on AI has launched the Safeguarding AI initiative, which is working to develop new safety mechanisms for AI systems. Fund research into developing new AI architectures for moral reasoning and develop new AI evaluation metrics for moral performance.** For example, the Moral Machine project is developing new evaluation metrics for AI systems' moral performance. By working together, Google, OpenAI, Anthropic, and other developers can help to fast-track AI morality and create AI systems that are more moral than humans. (199 words) submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    AI & Us Navigating the Digital Renaissance
    submitted by /u/Einsof__ [link] [comments]  ( 8 min )
    Prompt enginnering questions
    Is propt engineering a legit job ?? Is it here to stay ? Is it worth studying ? Best way to study it , land a job or freelancing ? submitted by /u/metasubcon [link] [comments]  ( 8 min )
    What app/program are they using on this Instagram?
    How does one make videos like on this Instagram page? It's unreal. https://instagram.com/nostalgicraindrops?igshid=MzRlODBiNWFlZA== submitted by /u/CK1886 [link] [comments]  ( 8 min )
    ChatGPT Can Now See? Mind-Blowing Ways People Can Use Image Recognition!
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Let’s make a list of the BEST AI TOOLS for students
    Every day, new AI tools appear. There are also AI tools designed to make students' lives easier—from AI essay generators to study organizers. While there are many directories with AI tools, they are often not well-sorted for students. So, I've compiled a list of my favorite AI tools for educational purposes. AI tool How to use for studies Bing Chat - Writing excel formulas - Making graphs and charts - Answers for homework assignments - Researching for a paper Textero.ai - Search for relevant academic sources for essays - Research assistance with the "Ask AI" feature - Essay generation and paper formatting - Structured essay outline creation - Summarizing of texts ChatPDF - Interacting with academic PDFs - Asking specific questions about the content - Quickly locating essential data for assignments Socratic - Breaking down complex homework questions - Providing step-by-step educational guidance - Safe and interactive learning Writely AI - Improving grammar and writing clarity - Creating concise study notes - Feedback for content quality Turnitin - Checking for copied content - Comparing against a vast academic database - Highlighting potential plagiarism Got any to add to the list? Let's share and help each other! submitted by /u/loyallyUrticate [link] [comments]  ( 9 min )
    Tested Dalle, created a monster.
    submitted by /u/Grindmaster_Flash [link] [comments]  ( 8 min )
    Meta's Llama 2 Long outperforms GPT 3.5 and Claude 2
    Meta Platforms recently introduced Llama 2 Long, a revolutionary AI model that outperforms top competitors with its ability to generate accurate responses to long user queries. For the latest advancements in AI, look here first. https://preview.redd.it/geqqd3k5rprb1.png?width=1920&format=png&auto=webp&s=e72a67fc7ef7e85902169f3061529c136beadc87 Meta's new AI model As an enhancement of the original Llama 2, Llama 2 Long deals with larger data containing longer texts and is modified to handle lengthier information sequences. Its stellar performance outshines other models such as OpenAI's GPT-3.5 Turbo and Claude 2. How Llama 2 Long works Meta built different versions of Llama 2, ranging from 7 billion to 70 billion parameters, which refines its learning from data. Llama 2 Long employs Rotary Positional Embedding (RoPE) technique, refining the way it encodes the position of each token, allowing fewer data and memory to produce precise responses. The model further fine-tunes its performance using reinforcement learning from human feedback (RLHF), and synthetic data generated by Llama 2 chat itself. Impressive feats and future aspirations Llama 2 Long can create high-quality responses to user prompts up to 200,000 characters long, which is approximately 40 pages of text. Its ability to generate responses to queries on diverse topics such as history, science, literature, and sports indicates its potential to cater to complex and various user needs. The researchers see Llama 2 Long as a step towards broader, more adaptable AI models, and advocate for more research and dialogue to harness these models responsibly and beneficially. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and developments in AI. Professionals from Meta, Google, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI Image Generator That Is Good At Referencing Pop Culture
    I've recently tried Canva and Dall-E to generate an image that references two popular games, Dark Souls 3 and Baldur's Gate 3. And they both fall on their face. Maybe my prompt is bad but Canva is not getting me what I want. Dall-E ran out of free credits. Do you guys have any recommendations. Midjourney is no longer free now. I would like this to be free and has good references to popular culture. submitted by /u/livingroomsessions [link] [comments]  ( 9 min )
  • Open

    Awarded DAGM MVTec Dissertation Award 2023
    In September, I received the DAGM MVTec dissertation award 2023 for my PhD thesis. DAGM is the German association for pattern recognition and organizes the German Conference on Pattern Recognition (GCPR) which is Germany's prime conference for computer vision and related research areas. I feel particularly honored by this award since my academic career started with my first paper published as part of the young researcher forum at GCPR 2015 in Aachen. The post Awarded DAGM MVTec Dissertation Award 2023 appeared first on David Stutz.  ( 3 min )
  • Open

    Supereggs, squigonometry, and squircles
    The Depths of Wikipedia twitter account posted a screenshot about supereggs that’s popular at the moment. It says there’s no way this is real. they must be making these words up above a screenshot from the Wikipedia article on supereggs saying The definition can be changed to have an equality rather than an inequality; this […] Supereggs, squigonometry, and squircles first appeared on John D. Cook.  ( 5 min )
    Corny AI
    Meredith Whittaker posted on Twitter that In addition to being the best in privacy, Signal is also the best in not subjecting you to corny ‘AI’ features no one asked for or wants. I love the phrase “corny AI.” That’s exactly what a lot of AI features are. “Would you like help composing that tweet?” […] Corny AI first appeared on John D. Cook.  ( 5 min )
    Today’s star
    The star-like image above is today’s exponential sum. The exponential sum page on my site generates a new image each day by putting the numbers of the day’s month, day, and year into the equation and connecting the partial sums in the complex plane. Here m is the month, d is the day, and y […] Today’s star first appeared on John D. Cook.  ( 5 min )
  • Open

    A more effective experimental design for engineering a cell into a new state
    By focusing on causal relationships in genome regulation, a new AI method could help scientists identify new immunotherapy techniques or regenerative therapies.  ( 10 min )
    Is AI in the eye of the beholder?
    Study shows users can be primed to believe certain things about an AI chatbot’s motives, which influences their interactions with the chatbot.  ( 10 min )
  • Open

    Fishing with Neural Nets | Transforming Ecology with Artificial Intelligence
    submitted by /u/plutoandmal [link] [comments]  ( 8 min )
  • Open

    Google at ICCV 2023
    Posted by Shaina Mehta, Program Manager, Google Google is proud to be a Platinum Sponsor of the International Conference on Computer Vision (ICCV 2023), a premier annual conference, which is being held this week in Paris, France. As a leader in computer vision research, Google has a strong presence at this year’s conference with 60 accepted papers and active involvement in 27 workshops and tutorials. Google is also proud to be a Platinum Sponsor for the LatinX in CV workshop. We look forward to sharing some of our extensive computer vision research and expanding our partnership with the broader research community. Attending ICCV 2023? We hope you’ll visit the Google booth to chat with researchers who are actively pursuing the latest innovations in computer vision, and check out som…  ( 94 min )

  • Open

    CGPT-4, how would an AI select the 100 most intelligent people on the planet based on their content rather than on their positions and awards?
    If an AI were to take on this task, it would primarily rely on data mining and natural language processing (NLP) techniques. First, the AI would scrape digital platforms to collect works authored by various people. Platforms could range from academic databases like PubMed or ArXiv to blogs, forums, and social media. The goal would be to capture a wide array of intellectual output, irrespective of the person's titles or accolades. Next, an NLP model would evaluate the collected content based on factors like originality, complexity, and coherence. Advanced sentiment analysis could be employed to gauge the depth of understanding and nuances in the arguments made. Specialized algorithms could also be developed to assess the impact of each piece of work, using metrics such as citations, social shares, or subsequent works that build upon it. The AI would then create a shortlist based on these evaluations. This stage might also involve unsupervised machine learning techniques like clustering to find patterns or commonalities among the top contenders. The final stage would be validation, possibly using reinforcement learning. The AI could simulate various scenarios or problems and predict how the content created by these individuals would contribute to solving them. It would then refine its list based on the simulated outcomes. This all-AI approach would drastically reduce human bias and could be executed relatively quickly. However, it's important to note that any such system would need to be designed carefully to avoid introducing biases present in the training data or algorithms. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    So it's unethical to kill an AI robot
    submitted by /u/bharath_brt [link] [comments]  ( 9 min )
    How Big Tech is co-opting the rising stars of artificial intelligence
    Big Tech's dominance in the artificial intelligence (AI) industry is growing as start-ups like Anthropic rely on their computing power and resources. Despite creating breakthrough AI technology, these start-ups still need the support of Big Tech to scale and succeed. The training of AI systems is expensive and requires specialized computer chips and data centers, which are mostly controlled by Amazon, Google, and Microsoft. Regulators, including the Federal Trade Commission and French competition authorities, are monitoring the industry for signs of anticompetitive behavior. Some business leaders believe that competition and efficiency will eventually drive down the cost of running AI models. Source : https://www.washingtonpost.com/technology/2023/09/30/anthropic-amazon-artificial-intelligence/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Data strategy >> Generative AI strategy
    A strong data strategy is crucial for the success of any AI strategy. Generative AI use cases depend on a healthy data infrastructure, including data governance, observability, catalog, data sharing, and lineage. Many enterprises lack the necessary data infrastructure to deploy customer-facing AI apps confidently. Poor data strategy and infrastructure can derail generative AI efforts. Existing issues with data ecosystems, such as data silos and poor data governance, will have a greater impact on generative AI workloads than new issues. Data silos, poor data discoverability, and the lack of data interoperability can become serious bottlenecks for generative AI apps. Source : https://nextword.substack.com/p/data-strategy-matters-for-generative submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Does anyone know a good AI tool to generate tattoo ideas and song cover art?
    Same as title submitted by /u/No-Educator-59 [link] [comments]  ( 9 min )
    Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    There's So Many AI Chatbots, But Which One Is The Best? (Complete Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    One-Minute Daily AI News 10/1/2023
    Microsoft Researchers Introduce AutoGen: An Artificial Intelligence Framework for Simplifying the Orchestration, Optimization, and Automation of LLM Workflows.[1] StoriaBoard helps filmmakers, marketers and other storytellers pre-visualize stories. Simply upload your script, select a visual style, and generate hundreds of frames in seconds.[2] Will Hurd Releases A.I. Plan, a First in the Republican Presidential Field.[3] Sam Altman says AI systems will automate some tasks but also lead to ‘new and much better jobs’.[4] Sources: [1] https://www.marktechpost.com/2023/09/30/microsoft-researchers-introduce-autogen-an-artificial-intelligence-framework-for-simplifying-the-orchestration-optimization-and-automation-of-llm-workflows/?amp [2] https://www.producthunt.com/posts/storiaboard [3] https://www.nytimes.com/2023/09/20/us/politics/will-hurd-ai-plan.html [4] https://www.businessinsider.com/openai-sam-altman-ai-will-automate-tasks-create-better-jobs-2023-9?amp submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    This is no time for ease and comfort. It is time to dare and endure. -Winston Churchill
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 9 min )
    Quizlet AI reliability?
    What is everyone’s thoughts on the reliablity of the quizlet AI? I just talked to a friend and she said that she uses the AI to study with quizlet. submitted by /u/immickle [link] [comments]  ( 9 min )
  • Open

    [R] The unsolved mystery at the heard of the "How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions" paper
    submitted by /u/CellWithoutCulture [link] [comments]  ( 9 min )
    [D] How many instructions can LLMs handle before they start to ignore them?
    Prompt engineering frequently involves trying to encode very specific behaviors into a model to steer it a certain direction. In practice, as requirements become more complex, you often end up with fairly lengthy prompts, especially when using methods like RAG. I was wondering, how effective are LLMs at following instructions as the system prompt grows in size and complexity? I did some quick experiments on this and found that, unsurprisingly, GPT-4 can follow a lot of rules (up to 50) quite accurately. But even GPT-3.5 slowly degrades and Llama-2-70b-chat starts to fail after just a few rules. Comparison of performance metrics over increasing rule counts, demonstrating GPT-4's consistent performance and a decline in accuracy for GPT-3.5 and Llama-2-70b-chat. These results are based on …  ( 10 min )
    [R] LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/rb0xg1ly8nrb1.png?width=1021&format=png&auto=webp&s=8e57056d24327ca2987abea12a7a9066a825738b submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    [P] Simplest model to run with limited hardware
    We want to run (not train, i.e. think single forward pass only) an ML algorithm on a machine with very limited resources. Which model could we use to show off the possibilities? If the benchmark is an MLP for binary image classification, what else could we do with a similar scale of operations? E.g. Which model is the simplest for e.g. text-to-image generation? Any other ML models that are simple enough to run and if initialized with good params, does something impressive submitted by /u/2i2i_tokenized_time [link] [comments]  ( 9 min )
    [P] Deep Memory, a Way to Boost Retrieval Accuracy by up to +22% for RAG
    submitted by /u/davidbun [link] [comments]  ( 9 min )
    [D] Perplexity.ai Search Feasibility
    I've been using Perplexity.ai for a bit now when it hit me that I don't understand how they can sustain their business model with search. Stuff like Bing search and Google search cost around $5 or more per 1000 searches, so how can they even afford to do this kind of search. Do they have their own search index. Also, I don't know how they pull in the data from these sources so fast? I've played around with some things like this with Langchain with retrieval, but the speed of splitting and tokenizing website html is not very fast. Have they already pre-scrapped the websites from the search results and tokenized them for LLM retrieval? submitted by /u/dragon18456 [link] [comments]  ( 9 min )
    Metagpt use case [D]
    Guys, i am currently working building a project, there are certain tasks like building a ml model using certain use-cases. I wish to automate this task, do u think metagpt is a good fit for the same. Let me know if you need any further information!! EDIT: One of the tasks my app needs to do is to convert image to text (aim to implement image captioning). So, if i give metaGPT the requirements for my project, is it possible it will give me the code which I need. I need to save certain tasks here so that I can focus more on operation and design side. Edit: it seems, such kind of vague questions are not encouraged on this platform, I will work and will straigh away ask questions which are quite good and meet the standards of this platform. Thanks!! Thanks!! Always have a massive respect for this community!! submitted by /u/aristotleTheFake [link] [comments]  ( 9 min )
    [R] Meta, INRIA researchers discover that explicit registers eliminate ViT attention spikes
    When visualizing the inner workings of vision transformers (ViTs), researchers noticed weird spikes of attention on random background patches. This didn't make sense since the models should focus on foreground objects. By analyzing the output embeddings, they found a small number of tokens (2%) had super high vector norms, causing the spikes. The high-norm "outlier" tokens occurred in redundant areas and held less local info but more global info about the image. Their hypothesis is that ViTs learn to identify unimportant patches and recycle them as temporary storage instead of discarding. This enables efficient processing but causes issues. Their fix is simple - just add dedicated "register" tokens that provide storage space, avoiding the recycling side effects. Models trained with registers have: Smoother and more meaningful attention maps Small boosts in downstream performance Way better object discovery abilities The registers give ViTs a place to do their temporary computations without messing stuff up. Just a tiny architecture tweak improves interpretability and performance. Sweet! I think it's cool how they reverse-engineered this model artifact and fixed it with such a small change. More work like this will keep incrementally improving ViTs. TLDR: Vision transformers recycle useless patches to store data, causing problems. Adding dedicated register tokens for storage fixes it nicely. Full summary. Paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Multiple single class segmentation vs single multiclass segmentation models
    submitted by /u/waterstrider123 [link] [comments]  ( 9 min )
    [R] SOTA of Deep-Shallow Encoder-Decoder LLMs for fast inference
    There's some evidence [1] [2] that it's possible to run text2text language model at substantially (potentially on the order of magnitude) better inference speed by keeping the decoder shallow. I'm curious whether some general reasoner SOTA (small model for machine translation available at [3]) style models are publicly available for this sort of thing. If not, how would one go about training one? Would it be necessary to do it entirely from scratch (extremely costly)? Or would it be possible to take, say, Flan-UL2 (20B), chop off its decoder, and train a much smaller decoder on top of it with the UL2 encoder frozen (ie how one trains adapter layers). Assuming the decoder hyperparameters are kept small, would this be possible within reasonable compute budget? Would that even meaningfully converge with small amount of compute (assuming same training objective as is for UL2)? Would the strength (ie somewhat comparable to 10B if we cut 20B in half) transfer from the SOTA encoder, or would cutting off half of the model layers kneecap it too badly? [1] https://arxiv.org/pdf/2006.10369.pdf [2] https://aclanthology.org/2023.sustainlp-1.6.pdf [3] https://github.com/snoop2head/Deep-Encoder-Shallow-Decoder submitted by /u/upalse [link] [comments]  ( 9 min )
    [D] Duplicating layers in large models
    Is there any notable work on duplicating layers in large feed forward models? In contrast to e.g. the brain which is essentially a directed graph most networks utilized nowerdays use a feed forward approach. E.g. transformers are able to attend to past tokens, but generate the tokens in a way where for a given token a given weight is not utilized at different stages in the feed forward pass. In my intuition this would lead to an issue where concepts (factual data as well as learned "algorithms") might be duplicated as they are needed at different depths in the generation process and are sequentially dependent on one another. This does not directly make the model less capable, as it might learn the same concept at two layers sufficiently well, but it reduces the data and parameter efficiency and and might impact generalization capabilities. Using a full on brain like graph might be hard to implement/optimize/scale on current hardware and is tricky with the backprop. But is there any work on duplicating a few layers, placing them at different depths in large models. I would guess that this would be more impactful for large models. One would essentially trade compute for better data efficiency. submitted by /u/floriv1999 [link] [comments]  ( 9 min )
    [n] Introducing r/AudioAI: Any AI You Can Hear!
    I couldn't find any AI sub dedicated to audio, so I’ve created r/AudioAI to serve as a hub for everything at the intersection of artificial intelligence and the world of sounds. AI-driven music, speech, audio production, and all other AI audio technologies. If anyone wants to be part of mod, let me know! submitted by /u/chibop1 [link] [comments]  ( 9 min )
  • Open

    LangDiversity: software to identify LLM errors
    Due to challenges such as hallucination, detecting errors in the output of a given prompt becomes an important challenge. LangDiversity is an implementation of "diversity measures" that are domain independent and can be used to measure the uncertainty in the result of a language model. ​ Type pip install langdiversity Video: https://www.youtube.com/watch?v=86J_K9mR7lw Web: https://neurosymbolic.asu.edu/llm-correction/ Visit https://github.com/lab-v2/langdiversity Read the paper: https://arxiv.org/abs/2308.11189 https://preview.redd.it/o0v8p9g7tmrb1.png?width=1021&format=png&auto=webp&s=ff1ac672b61f96e4669663410769127066a0674d submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
    Equation for what neurons (of 1s that attach parietal region to conscious brain regions) should attach to microprocessor to offload math functions?
    " Bio education below *. Summarization: ~1000 IO neurons attach math regions to conscious regions, low cost 1000-electrod microprocessors can run on radio. * https://youtube.com/watch?v=bhp2CkNDxME Don't want for self; want for professors and humans who program KUKA's/FANUC's for construction, and for who do calculations/optimizations for CUDA, MS Visual Studio and such, but what go up for experimentation should funds allow." sounds fun submitted by /u/2002LuvAbbaLuvU [link] [comments]  ( 9 min )
  • Open

    Reinforcement Learning + Computer Vision listing papers
    Hello everyone! A while back, I stumbled upon an interesting paper that applied Reinforcement Learning to Object Localization. I got fascinated by how computer vision tasks could be transformed into a reinforcement learning problem, making it feel like a Markov decision process ! So, i've decided to create a repository to compile all the existing (published) papers that delve into Reinforcement Learning in Computer Vision : https://github.com/rayanramoul/RLCV-Papers If you have any papers in mind or recommendations to enhance the repository, please don't hesitate to share them. Your input would be greatly appreciated! Thank you! :) submitted by /u/raysamram [link] [comments]  ( 9 min )
    Multi-Agent DQN not learning for Clean Up Game - Reward slowly decreasing
    The environment of the Clean Up game is simple: in a 25*18 grid world, there's dirt spawning on the left side and apples spawning on the other. Agents get a +1 reward for eating an apple (by stepping onto it). Agents clean the dirt also by stepping on it (no reward). Agent can go up, down, left, right. The game goes on for 1000 steps. Apple's spawn probability depends on the amount of dirt (less dirt, higher the probability). Currently, the observation for each agent has the manhatten distance to their closest apple and dirt. I have tried multiple ways of training this, including changing the observation space of the agents. But it seems the result does not outperform random agents by any significant amount. The network is simple, it tries to take in all the observations for all the agen…  ( 10 min )
  • Open

    Entity Language Models: Monetizing Language Models – Part 2
    We must move beyond just taming…to monetizing Language Models! In part 1 of this series on Small Language Models (“Use Case Language Models: Taming the LLM Beast – Part 1”), I explored the business and operational value of Use Case-specific Small Language Models (Use Case Language Models). Use case language models are trained or adapted… Read More »Entity Language Models: Monetizing Language Models – Part 2 The post Entity Language Models: Monetizing Language Models – Part 2 appeared first on Data Science Central.  ( 23 min )
  • Open

    Botober 2023
    Since 2019 I've generated October drawing prompts using the year's most state-of-the-art text-generating models. Every year the challenges are different, but this was one of the hardest years yet. Large language models like chatgpt, GPT-4, Bing Chat, and Bard, are all tweaked to produce generic, predictable  ( 6 min )
    Bonus: There was no 2020 Botober?
    AI Weirdness: the strange side of machine learning  ( 2 min )

  • Open

    [P]Handling categorical missing data in churn prediction model for telecom data
    I am working on a telecom dataset where I need to fit a model to for predicting churn(yes or no). There are a lot of categorical data with missing values( total values 7043). What is the best way to handle missing data in this case, is it better to ignore it or any other better imputation method? Data columns (total 21 columns): customerID 7043 non-null object gender 7043 non-null object Age 7043 non-null int64 Partner 7043 non-null object Dependents 7043 non-null object tenure 7043 non-null int64 PhoneService 7043 non-null object MultipleLines 6500 non-null object InternetService 6500 non-null object OnlineSecurity 7043 non-null object OnlineBackup 7043 non-null object DeviceProtection 7043 non-null object TechSupport 7043 non-null object StreamingTV 6500 non-null object StreamingMovies 6500 non-null object Contract 6500 non-null object PaperlessBilling 7043 non-null object PaymentMethod 6500 non-null object MonthlyCharges 7043 non-null float64 TotalCharges 7043 non-null object Churn 7043 non-null object submitted by /u/guyloveskissing [link] [comments]  ( 9 min )
    [D] (How) Can you estimate inference speed of a NN model on given hardware?
    How, outside of testing, do you estimate how quickly a specific model will run on some hardware? Anything about time is rarely mentioned in papers and if it is, it's more likely to talk about training, unless authors are specifically proud of their speed (like YOLO). Even less so in any README. Some way to translate numbers of parameters into seconds on a given GPU/CPU, any rules of thumb better than just setting up everything every time? submitted by /u/teleoflexuous [link] [comments]  ( 9 min )
    [D] How do I begin with AI ?
    I'm fairly new to the Al domain. I've decent python knowledge. I've gone through a lot of YouTube tutorials and got stuck in the tutorial hell. After struggling through hours of videos came here as my only last hope !!. How do I begin? What python frameworks should I learn? Which particular books should I refer ? submitted by /u/Dry_Ad_3887 [link] [comments]  ( 9 min )
    [D] Struggling to get interviews what to do?
    Edit: I am a USA citizen so no need for sponsorship. I have 4 yoe in a start up company and a phd four publications 2 in high level math journals and 2 CV/DL papers in A journals and also 4 patents. I have experience with most common Cv tasks eg object detection, Multi object tracking, 2d/3d human pose estimation and monocular depth estimation. I’m well versed in typical network building blocks eg conv nets, FFNs, transformers, Diffusion etc. I have a little experience with NLP like NLTK and TTS networks. Also some other general dev technologies like ec2,s3,sql,mongoose, etc. That all being said I can’t seem to even get interviews these days just straight rejections not talking to recruiters. On the other hand in 2020, I was just searching for jobs passively and had something like a 75% success rate with getting interviews. I know the job market has changed but I’m a lot more experienced at this time than then and having abysmal luck. Anyone have any advice would be happy to share my resume if that would make it easier to give advice. Also open to hearing what other technologies o should/could learn. submitted by /u/AbjectDrink3276 [link] [comments]  ( 9 min )
    Arxiv [D]ives - Segment Anything
    Every Friday for the past few months we’ve been hosting a public paper club called “Arxiv Dives”. We pick a paper and dive deep into it and chat about it as a group. There are a lot of gems of knowledge hidden in these research papers, and the main motivation is simply to keep up with most impactful techniques in the field by taking the time to dive in and discuss. The attendees so far have been great, and would love for anyone is interested to join! https://lu.ma/oxenbookclub submitted by /u/FallMindless3563 [link] [comments]  ( 9 min )
    [D] What exactly are the compute requirements for training a dense model versus an MoE?
    Hi, New to ML, I can't find a clear answer to this question. I find references online to a 1.8 trillion parameter model taking up the computational power of a 10B model, yet I also hear that the memory requirements a lot higher for an MoE? If I was interested in training/inferencing, for example, a 15M dense model, or a 60M MoE with 4 15M experts. whats the difference gonna be? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [D] How close are we to Neuro-Symbolic architectures that are 100% accurate?
    I’m new to AI/ML and my understanding is that (1) LLMs are SOTA in many tasks, and their short comings, such as ~70% accuracy, hallucinations, inability to learn from small samples etc, are well known. (2) Neuro-symbolic approaches are apparently the way to get accuracy to 100% and solve other shortcomings. So question is (3) What are the promising research in LLMs+Symbolic architectures? (4) And how close is it to production, rather than academic? (5) Do we need non-LLM based architectures instead? submitted by /u/reeldeele [link] [comments]  ( 9 min )
    [D] How to Integrate fine tuned LLAMA 2 in website ?
    I'm absolute beginner in Machine Learning. Me and My team are building a Chat Bot that recommends medicine based on symptoms, for that we are fine tuning LLAMA 2. Uploading BOOKS to train and we will ask question based on that books. SomeHow I got code on github to FineTune LLAMA 2. But how can I Integrate in my website ? How to connect it in my web app. Need some guidance. We have submission in 2 weeks. If anyone is willing to mentor us in this project or just guide what to do. submitted by /u/BookAny3024 [link] [comments]  ( 9 min )
    [D] What algorithms to use text classification
    I have some data - twitter description of an event in text and the event itself. If I have 100000 tweets in column X and a category in Y - e.g sporting event review, movie review, news, etc what is the best algorithm to match them. Should I make the description a bag of words and depending on the word frequency I can train a ML model (random forest,svm,etc.) or can the algorithm take into account the order. submitted by /u/AnyJello605 [link] [comments]  ( 9 min )
    [D] Deploy the Mistral 7b Generative Model on an A10 GPU on AWS
    Hello, The Mistral 7b AI model beats LLaMA 2 7b on all benchmarks and LLaMA 2 13b in many benchmarks. It is actually even on par with the LLaMA 1 34b model. So I made a quick video about how to deploy this model on an A10 GPU on an AWS EC2 g5.4xlarge instance: https://nlpcloud.com/deploy-mistral-7b-on-a10-gpu-on-aws.html I hope it will be useful. If you have recommendations about how to improve this video please don't hesitate to let me know, that will be very much appreciated! Julien submitted by /u/juliensalinas [link] [comments]  ( 9 min )
    [D] CIDEr values in PaLI model and XM 3600 dataset
    I am reading PaLI: A Jointly-Scaled Multilingual Language-Image Model . In their table 2 (page 6), it's reported that Thapliyal et al. (2022) (0.8B) model got 57.6 of CIDEr on XM 3600 for English. Thapliyal et al. (2022) is Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset. However in this paper, the CIDEr values are reported less than 1. For example, the largest model got 0.584 of CIDEr on XM 3600 for English. Could someone explain to me why those values have great differences? submitted by /u/KingsmanVince [link] [comments]  ( 9 min )
    [R] Pathway to self-learning mathematics and statistics for ML research
    Hey everyone. I am very passionate about getting in ML research and was wondering what the learning pathway was, particularly with regards to the theoretical Math and Statistics involved. For context: I am a second year undergraduate. By the end of this year, I will have taken and finished A Multivariable Calculus with Proofs course, so that is my current starting point. I have been working with ML for the last 3 years and am proficient in Python and frameworks like PyTorch. I have also made my own implementation of several research papers (LSTMs, GRUs, Transformers, ELMo, BERT, GPT, as well as a few computer vision papers). I have a good general intuition of how deep learning works, but I want to formalise this knowledge with the adequate mathematical background so that I can eventually pursue a career in research. I understand that I have plenty of time until I reach there, and I am willing to dedicate it to grinding out the math and statistical knowledge required. I have done my research on this sub and other forums, and here are a few resources that stood out: Mathematics for Machine Learning by Deisenroth, Faisal and Ong Advanced Calculus of Several Variables by C. H. Edwards Jr. Mathematical Methods Lecture Notes from Imperial College by Deisenroth and Cheraghchi The original information theory paper by Shannon The Elements of Statistical Learning by Hastie, Tibshirani and Friedman Pattern Recognition and Machine Learning by Bishop The Probabalistic Machine Learning Series by Kevin P. Murphy Deep Learning by Goodfellow, Bengio and Courville Mathematics of Machine Learning on MIT OCW (here) My question is, what order should I start self-learning in, given the (somewhat limited) background knowledge I have? Also, are there any other resources that would help? submitted by /u/Far_Clothes_5054 [link] [comments]  ( 10 min )
    [D] What is the best open-source framework to create a synthetic and domain specific dataset for fine-tuning small models?
    Hi everyone, With the different data points, such as phi-1.5 performance being as good as 7b models on some tasks, it seems to be plausible that small models can be quite capable on specific tasks. I am working on BlindChat, an open-source and private solution to run small LLMs on your browser and I am interested in fine-tuning a phi-1.5 on some domain specific data. I am thinking of having an approach similar to the researchers of the phi paper, which is creating a high quality dataset using GPT3.5 / GPT4. Do you know good open-source frameworks that make it easy to create a high quality data for a specific task using an existing large model, like GPT3.5/4 or Llama 2 70b? submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [P] How do I train or tune an LLM like LLaMA for my business
    I want to tune Facebook's LLaMA or any available LLM model to be able to answer questions about my business. The idea is to provide a prompt of the business and some Q&As, then based on the provided information, the AI chatbot will answer customers who ask questions about the business. If the answers to the questions are not known or the question is not relevant, the bot should say "I dont know". submitted by /u/the_aceix [link] [comments]  ( 9 min )
  • Open

    I have blocked user u/NuseAI ...
    Hi, I have never done this before, but I have blocked user u/NuseAI from my feeds He/she is posting 'news' all over the AI subs, including this one, at the moment and is filling up my timeline ... and I simply don't feel right about what they are up to. Is it an AI bot? Is it a karma farmer? Is it some sort of spam? Am I being over cautious? If the consensus is that they are a normal poster - fine - I'll reenable their posts. In the meantime I'm enjoying a less cluttered feed! ​ submitted by /u/MrEloi [link] [comments]  ( 9 min )
    Counterfeit people': The danger posed by Meta’s AI celebrity lookalike chatbots
    Meta has launched chatbots with personalities similar to certain celebrities, which some experts believe could be dangerous. These chatbots have their own faces and social media accounts, and Meta is working on giving them a voice. However, experts argue that the idea of chatbots with personalities is impossible, as algorithms cannot demonstrate intention or free will. There is also a risk that chatbots with personalities could express problematic opinions, as seen in Meta's testing. Meta's project is driven by profit, as users are more likely to engage with chatbots that seem human. Experts believe that Meta should have explained the limits of these chatbots instead of emphasizing their human characteristics. Source : https://www.france24.com/en/technology/20230930-counterfeit-people-the-dangers-posed-by-meta-s-ai-celebrity-lookalike-chatbots submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Artificially Intelligent, Genuinely Creative: How AI's Triumph Over Human Creators Exposes the Illusion of Intellectual Property
    submitted by /u/DukeWilder [link] [comments]  ( 9 min )
    Is my domain name a good idea? What can I build on it? Go Go AI Go dot com .... No webpage on it now, any good ideas???
    I was cooking chicken wings one evening ago in the not too distant past and this idea popped into my head. Before the night was over I went online and bought the domain name of GoGoAIGo . com and then the .ai version also. I put the dot com version up on Sedo (sedo.com/search/details/?domain=GoGoAIGo.com) for sale and I actually now own the .com .ai .org and .net versions of that phrase. Not only my decade but the two generational decades in front of me and the one generational decade behind me can remember our ole Inspector Gadget friend whom had a similar phrase, but not exact, that he would say. I'm an individual whom may hold onto something if I feel it has intrinsic value for a future development, which I think this can if laid out in an appropriate fashion. I'm working on another business project right now and I own some trademarks for my other business project so I'm not exactly a newbie in ways here I'm just kind of fresh to the AI realm studies. I think it's overblown right now but will be fine tuned over the next 5-7 years better and society will find a better seat for it. I could see this domain being like a search engine or something, maybe even something to do with robots. I expect AI robots moving forward will be regulated and have various classes that they are placed into as we integrate certain ones in our society. Let's be honest, the light-switch isn't flipping overnight or even in one quick year over this AI stuff. I'm in no rush to have a piece of AI wash my dishes for me to be honest. The last robotic thing I was thinking about getting was a robot mower to cut a field, I believe they are working on those now. Anybody have any unique ideas for me? I used to play with lego robots way back in high school in the early 2000's.... Seems like this website would make a great search engine but honestly there are other phrases that can be put into play with society also. Thanks for any mental stimulation you can toss in my direction. submitted by /u/Wise_Cut_2543 [link] [comments]  ( 10 min )
    CGPT-4, how could an AI app designed to move people from their screens to better enjoying the people in their life do this?
    Imagine an app that's like a helpful buddy in your pocket, always looking out for the best moments to nudge you into some real-world socializing. For example, say you're a fan of watching sports. The app notices you frequently check scores or read articles on sports sites during weekends. Right before a big game, it pops up and says, "How about inviting some friends over to watch the game?" Now let's talk about making socializing a sort of game. Think of the way Fitbit rewards you for walking 10,000 steps. Similarly, this app could reward you with "social points" for various activities. Invite a friend for coffee? 10 points. Call your mom? 15 points. Organize a barbecue? 50 points. And so on. These points could unlock virtual badges or even real-world rewards like discounts at local restaurants to encourage you to keep going. When it comes to setting personal goals, let's say you've been wanting to improve your relationship with a sibling. You set a goal in the app to have at least one meaningful conversation with them each week. The app then reminds you on a lazy Sunday afternoon, suggesting, "Why not call your sister now? It’s a good time to catch up." And for reflection, after you've hung out with your friends to watch the game or had that talk with your sister, the app asks you to rate how good you felt on a scale of 1-10. Over time, you'll see a graph of your happiness levels correlated with your social activities, making it super clear that quality time with people is a mood booster. The whole idea is to keep it simple but effective, helping you to naturally weave more social interactions into your life without making you feel overwhelmed or stressed. submitted by /u/Georgeo57 [link] [comments]  ( 10 min )
    Is AI a Platform Shift?
    AI has the potential to be a platform shift, similar to previous shifts like personal computers, the internet, and mobile. Platform shifts change the dominant layer that applications are built on and can capture the majority of value from the previous generation. AI could change distribution, business models, and what's possible in workflows. Changes in distribution could lead to new aggregators replacing old ones, making the aggregation of quality content more difficult. The business model may not change significantly, with AI likely being delivered as software-as-a-service. AI can enable new workflows and drastically change existing ones. While incumbents may accrue significant value, new platforms could also replace old ones. Source : https://matt-rickard.com/is-ai-a-platform-shift submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there a market for Small Language Models for specific jobs/domains?
    It seems that large language models are getting bigger and bigger, and by growing they need more and more processing power. I know that some LLM developers have made smaller versions to test how small they can be made and function. But what happens when you want a LLM to do a specific job, surely it only needs a fraction of the data a general-purpose model does. Potential benefits of SLMs: Less data. Potentially faster. Less space to hallucinate/go wrong. Smaller set of potentials for complete testing. Running costs reduced. Lower spec hardware needs. Has anyone tried dedicating a LLM to a specific job/task and then optimizing its data size to create a SLM? TLDR; How large does a LLM have to be for a toaster or microwave? Talkie Toaster https://www.youtube.com/watch?v=vLm6oTCFcxQ submitted by /u/Arowx [link] [comments]  ( 9 min )
    Books 3 has revealed thousands of pirated Australian books. In the age of AI, is copyright law still fit for purpose?
    submitted by /u/Jariiari7 [link] [comments]  ( 9 min )
    Deep dive into Mastering Prompt Engineering (Prompt-tier list)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Looking for open source headless text to singing or better yet MIDI to singing software
    Scoured the Internet using all available tools. All I've come up with is proprietary and obsolete software and/or GUI-based software. My goal is to create an ElevenLabs type api but for singing. Something like Flinger (dead) would be ideal. If I can't find it I plan to write it but I'd rather not reinvent the wheel. submitted by /u/geeezeredm [link] [comments]  ( 9 min )
    Is it possible for AI to deeply analyze importance of thousands of daily news?
    I have access to texts of thousands of world news daily. Is it possible to make an AI that would analyze them and sort by importance? All I could find similar is NLP for analyzing text content and extracting keywords, or metadata, but this approach doesn't work well. I want for AI to grasp the essence of news and deeply understand their importance, to comprehend how an event affects many people's lives and has significant impact on society or the world as a whole. submitted by /u/canman44999 [link] [comments]  ( 9 min )
    Dalle-3 has me thinking about my unborn child and reality itself.
    I was able to throw these images together in seconds and it has me stunned. This is all in the first year of mainstream AI. Where are we going to be this time next year.. Philosophically what do you believe is going to happen to our paradigms of reality over the coming years? This is an especially challenging thought because we consume so much content and information digitally. I'm a little worried about how humans will or will not adjust to this incoming technology. I'm having my first child soon and it's interesting to think about what I may have to teach them. That nothing you consume digitally is real, only what you can experience with all 5 senses in your local environment is. Strange thoughts to be having for sure. With peace, Aqua. submitted by /u/Aquaritek [link] [comments]  ( 9 min )
    The Ethical Dilemmas of AI in Sci-Fi and Reality
    An interesting article about ethics and AI in the real world versus what we find in scifi. Exploring points like privacy invasion, possible sentience, control and moral decisions. https://discover.hubpages.com/technology/the-ethical-dilemmas-of-ai-in-sci-fi-and-reality submitted by /u/No_Adhesiveness_7209 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/29/2023
    Meta Platforms (META.O) Chief Executive Mark Zuckerberg on Wednesday rolled out new AI products for consumers, including bots that create photo-realistic images and smart glasses that answer questions, as well as an updated virtual-reality headset.[1] The European Union is examining alleged anticompetitive practices in chips used for artificial intelligence, a market that Nvidia (NVDA.O) dominates, Bloomberg News reported on Friday, citing people familiar with the matter.[2] Sex robots powered by futuristic AI algorithm will one day give humans the best sex of their lives, it has been sensationally claimed.[3] National Security Agency Director Army Gen. Paul M. Nakasone today announced the creation of a new entity to oversee the development and integration of artificial intelligence capabilities within U.S. national security systems.[4] Sources: [1] https://www.reuters.com/technology/meta-signal-future-arvr-investments-annual-connect-conference-2023-09-27/ [2] https://www.reuters.com/technology/eu-starts-early-stage-probe-into-nvidia-dominated-ai-chip-market-abuses-2023-09-29/ [3] https://www.dailystar.co.uk/news/weird-news/sex-robots-using-ai-give-31059169 [4] https://www.defense.gov/News/News-Stories/Article/Article/3541838/ai-security-center-to-open-at-national-security-agency/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Consecutive coupon collector problem
    Coupon collector problem Suppose you have a bag of balls labeled 1 through 1,000. You draw draw balls one at a time and put them back after each draw. How many draws would you have to make before you’ve seen every ball at least once? This is the coupon collector problem with N = 1000, […] Consecutive coupon collector problem first appeared on John D. Cook.  ( 6 min )
  • Open

    Testing RNN with RLlib
    Hi folks! Since you've saved my ass before, maybe you have an idea about my issue here, too. I'm training and testing a custom RNN, but I receive the following error message: File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/utils/threading.py", line 24, in wrapper return func(self, *a, **k) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/policy/torch_policy_v2.py", line 1291, in _compute_action_helper dist_inputs, state_out = self.model(input_dict, state_batches, seq_lens) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/models/modelv2.py", line 259, in __call__ res = self.forward(restored, state or [], seq_lens) File "/home/.conda/envs/ray/lib/python3.9/site-packages/ray/rllib/models/torch/recurrent_net.py", line 92, in forward i…  ( 9 min )
  • Open

    RACH-Space: Reconstructing Adaptive Convex Hull Space with applications in weak supervision. (arXiv:2307.04870v3 [cs.LG] UPDATED)
    We introduce RACH-Space, a novel classification method in ensemble learning. In particular, we show its applicability as a label model for weakly supervised learning. RACH-Space offers simplicity in implementation with minimal assumptions on the data or weak signals. The model is well suited for scenarios where fully labeled data is not available. Our method is built upon geometrical interpretation of the space spanned by weak signals. Our analysis of the high dimensional convex hull structure underlying general set of weak signals bridges geometry with machine learning. Empirical results also demonstrate that RACH-Space works well in practice and compares favorably to best existing label models for weakly supervised learning.  ( 2 min )
    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity. (arXiv:2309.16512v1 [cs.LG])
    In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.  ( 2 min )
    MHG-GNN: Combination of Molecular Hypergraph Grammar with Graph Neural Network. (arXiv:2309.16374v1 [cs.LG])
    Property prediction plays an important role in material discovery. As an initial step to eventually develop a foundation model for material science, we introduce a new autoencoder called the MHG-GNN, which combines graph neural network (GNN) with Molecular Hypergraph Grammar (MHG). Results on a variety of property prediction tasks with diverse materials show that MHG-GNN is promising.  ( 2 min )
    Group-Agent Reinforcement Learning. (arXiv:2202.05135v4 [cs.LG] UPDATED)
    It can largely benefit the reinforcement learning (RL) process of each agent if multiple geographically distributed agents perform their separate RL tasks cooperatively. Different from multi-agent reinforcement learning (MARL) where multiple agents are in a common environment and should learn to cooperate or compete with each other, in this case each agent has its separate environment and only communicates with others to share knowledge without any cooperative or competitive behaviour as a learning outcome. In fact, this scenario exists widely in real life whose concept can be utilised in many applications, but is not well understood yet and not well formulated. As the first effort, we propose group-agent system for RL as a formulation of this scenario and the third type of RL system with respect to single-agent and multi-agent systems. We then propose a distributed RL framework called DDAL (Decentralised Distributed Asynchronous Learning) designed for group-agent reinforcement learning (GARL). We show through experiments that DDAL achieved desirable performance with very stable training and has good scalability.  ( 2 min )
    AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models. (arXiv:2309.16414v1 [cs.CV])
    Classifiers built upon vision-language models such as CLIP have shown remarkable zero-shot performance across a broad range of image classification tasks. Prior work has studied different ways of automatically creating descriptor sets for every class based on prompt templates, ranging from manually engineered templates over templates obtained from a large language model to templates built from random words and characters. In contrast, deriving zero-shot classifiers from the respective encoded class descriptors has remained nearly unchanged, that is: classify to the class that maximizes the cosine similarity between its averaged encoded class descriptors and the encoded image. However, weighting all class descriptors equally can be suboptimal when certain descriptors match visual clues on a given image better than others. In this work, we propose AutoCLIP, a method for auto-tuning zero-shot classifiers. AutoCLIP assigns to each prompt template per-image weights, which are derived from statistics of class descriptor-image similarities at inference time. AutoCLIP is fully unsupervised, has very low overhead, and can be easily implemented in few lines of code. We show that for a broad range of vision-language models, datasets, and prompt templates, AutoCLIP outperforms baselines consistently and by up to 3 percent point accuracy.  ( 2 min )
    An Uncertainty-Aware Pseudo-Label Selection Framework using Regularized Conformal Prediction. (arXiv:2309.15963v1 [cs.LG])
    Consistency regularization-based methods are prevalent in semi-supervised learning (SSL) algorithms due to their exceptional performance. However, they mainly depend on domain-specific data augmentations, which are not usable in domains where data augmentations are less practicable. On the other hand, Pseudo-labeling (PL) is a general and domain-agnostic SSL approach that, unlike consistency regularization-based methods, does not rely on the domain. PL underperforms due to the erroneous high-confidence predictions from poorly calibrated models. This paper proposes an uncertainty-aware pseudo-label selection framework that employs uncertainty sets yielded by the conformal regularization algorithm to fix the poor calibration neural networks, reducing noisy training data. The codes of this work are available at: https://github.com/matinmoezzi/ups conformal classification  ( 2 min )
    Compilation as a Defense: Enhancing DL Model Attack Robustness via Tensor Optimization. (arXiv:2309.16577v1 [cs.LG])
    Adversarial Machine Learning (AML) is a rapidly growing field of security research, with an often overlooked area being model attacks through side-channels. Previous works show such attacks to be serious threats, though little progress has been made on efficient remediation strategies that avoid costly model re-engineering. This work demonstrates a new defense against AML side-channel attacks using model compilation techniques, namely tensor optimization. We show relative model attack effectiveness decreases of up to 43% using tensor optimization, discuss the implications, and direction of future work.  ( 2 min )
    Compositional Program Generation for Systematic Generalization. (arXiv:2309.16467v1 [cs.LG])
    Compositional generalization is a key ability of humans that enables us to learn new concepts from only a handful examples. Machine learning models, including the now ubiquitous transformers, struggle to generalize in this way, and typically require thousands of examples of a concept during training in order to generalize meaningfully. This difference in ability between humans and artificial neural architectures, motivates this study on a neuro-symbolic architecture called the Compositional Program Generator (CPG). CPG has three key features: modularity, type abstraction, and recursive composition, that enable it to generalize both systematically to new concepts in a few-shot manner, as well as productively by length on various sequence-to-sequence language tasks. For each input, CPG uses a grammar of the input domain and a parser to generate a type hierarchy in which each grammar rule is assigned its own unique semantic module, a probabilistic copy or substitution program. Instances with the same hierarchy are processed with the same composed program, while those with different hierarchies may be processed with different programs. CPG learns parameters for the semantic modules and is able to learn the semantics for new types incrementally. Given a context-free grammar of the input language and a dictionary mapping each word in the source language to its interpretation in the output language, CPG can achieve perfect generalization on the SCAN and COGS benchmarks, in both standard and extreme few-shot settings.  ( 3 min )
    Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples. (arXiv:2309.16143v1 [cs.LG])
    Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.  ( 3 min )
    Instance-Agnostic Geometry and Contact Dynamics Learning. (arXiv:2309.05832v2 [cs.CV] UPDATED)
    This work presents an instance-agnostic learning framework that fuses vision with dynamics to simultaneously learn shape, pose trajectories, and physical properties via the use of geometry as a shared representation. Unlike many contact learning approaches that assume motion capture input and a known shape prior for the collision model, our proposed framework learns an object's geometric and dynamic properties from RGBD video, without requiring either category-level or instance-level shape priors. We integrate a vision system, BundleSDF, with a dynamics system, ContactNets, and propose a cyclic training pipeline to use the output from the dynamics module to refine the poses and the geometry from the vision module, using perspective reprojection. Experiments demonstrate our framework's ability to learn the geometry and dynamics of rigid and convex objects and improve upon the current tracking framework.  ( 2 min )
    Safe Imitation Learning of Nonlinear Model Predictive Control for Flexible Robots. (arXiv:2212.02941v2 [cs.RO] UPDATED)
    Flexible robots may overcome some of the industry's major challenges, such as enabling intrinsically safe human-robot collaboration and achieving a higher load-to-mass ratio. However, controlling flexible robots is complicated due to their complex dynamics, which include oscillatory behavior and a high-dimensional state space. NMPC offers an effective means to control such robots, but its extensive computational demands often limit its application in real-time scenarios. To enable fast control of flexible robots, we propose a framework for a safe approximation of NMPC using imitation learning and a predictive safety filter. Our framework significantly reduces computation time while incurring a slight loss in performance. Compared to NMPC, our framework shows more than a eightfold improvement in computation time when controlling a three-dimensional flexible robot arm in simulation, all while guaranteeing safety constraints. Notably, our approach outperforms conventional reinforcement learning methods. The development of fast and safe approximate NMPC holds the potential to accelerate the adoption of flexible robots in industry.  ( 2 min )
    Deep learning models for price forecasting of financial time series: A review of recent advancements: 2020-2022. (arXiv:2305.04811v2 [q-fin.ST] UPDATED)
    Accurately predicting the prices of financial time series is essential and challenging for the financial sector. Owing to recent advancements in deep learning techniques, deep learning models are gradually replacing traditional statistical and machine learning models as the first choice for price forecasting tasks. This shift in model selection has led to a notable rise in research related to applying deep learning models to price forecasting, resulting in a rapid accumulation of new knowledge. Therefore, we conducted a literature review of relevant studies over the past three years with a view to aiding researchers and practitioners in the field. This review delves deeply into deep learning-based forecasting models, presenting information on model architectures, practical applications, and their respective advantages and disadvantages. In particular, detailed information is provided on advanced models for price forecasting, such as Transformers, generative adversarial networks (GANs), graph neural networks (GNNs), and deep quantum neural networks (DQNNs). The present contribution also includes potential directions for future research, such as examining the effectiveness of deep learning models with complex structures for price forecasting, extending from point prediction to interval prediction using deep learning models, scrutinising the reliability and validity of decomposition ensembles, and exploring the influence of data volume on model performance.  ( 3 min )
    TinyMetaFed: Efficient Federated Meta-Learning for TinyML. (arXiv:2307.06822v3 [cs.LG] UPDATED)
    The field of Tiny Machine Learning (TinyML) has made substantial advancements in democratizing machine learning on low-footprint devices, such as microcontrollers. The prevalence of these miniature devices raises the question of whether aggregating their knowledge can benefit TinyML applications. Federated meta-learning is a promising answer to this question, as it addresses the scarcity of labeled data and heterogeneous data distribution across devices in the real world. However, deploying TinyML hardware faces unique resource constraints, making existing methods impractical due to energy, privacy, and communication limitations. We introduce TinyMetaFed, a model-agnostic meta-learning framework suitable for TinyML. TinyMetaFed facilitates collaborative training of a neural network initialization that can be quickly fine-tuned on new devices. It offers communication savings and privacy protection through partial local reconstruction and Top-P% selective communication, computational efficiency via online learning, and robustness to client heterogeneity through few-shot learning. The evaluations on three TinyML use cases demonstrate that TinyMetaFed can significantly reduce energy consumption and communication overhead, accelerate convergence, and stabilize the training process.  ( 2 min )
    Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance. (arXiv:2309.16604v1 [stat.ML])
    Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.  ( 2 min )
    Capturing the Diffusive Behavior of the Multiscale Linear Transport Equations by Asymptotic-Preserving Convolutional DeepONets. (arXiv:2306.15891v3 [cs.LG] UPDATED)
    In this paper, we introduce two types of novel Asymptotic-Preserving Convolutional Deep Operator Networks (APCONs) designed to address the multiscale time-dependent linear transport problem. We observe that the vanilla physics-informed DeepONets with modified MLP may exhibit instability in maintaining the desired limiting macroscopic behavior. Therefore, this necessitates the utilization of an asymptotic-preserving loss function. Drawing inspiration from the heat kernel in the diffusion equation, we propose a new architecture called Convolutional Deep Operator Networks, which employ multiple local convolution operations instead of a global heat kernel, along with pooling and activation operations in each filter layer. Our APCON methods possess a parameter count that is independent of the grid size and are capable of capturing the diffusive behavior of the linear transport problem. Finally, we validate the effectiveness of our methods through several numerical examples.  ( 2 min )
    Visual In-Context Learning for Few-Shot Eczema Segmentation. (arXiv:2309.16656v1 [cs.CV])
    Automated diagnosis of eczema from digital camera images is crucial for developing applications that allow patients to self-monitor their recovery. An important component of this is the segmentation of eczema region from such images. Current methods for eczema segmentation rely on deep neural networks such as convolutional (CNN)-based U-Net or transformer-based Swin U-Net. While effective, these methods require high volume of annotated data, which can be difficult to obtain. Here, we investigate the capabilities of visual in-context learning that can perform few-shot eczema segmentation with just a handful of examples and without any need for retraining models. Specifically, we propose a strategy for applying in-context learning for eczema segmentation with a generalist vision model called SegGPT. When benchmarked on a dataset of annotated eczema images, we show that SegGPT with just 2 representative example images from the training dataset performs better (mIoU: 36.69) than a CNN U-Net trained on 428 images (mIoU: 32.60). We also discover that using more number of examples for SegGPT may in fact be harmful to its performance. Our result highlights the importance of visual in-context learning in developing faster and better solutions to skin imaging tasks. Our result also paves the way for developing inclusive solutions that can cater to minorities in the demographics who are typically heavily under-represented in the training data.  ( 2 min )
    Harmonic-NAS: Hardware-Aware Multimodal Neural Architecture Search on Resource-constrained Devices. (arXiv:2309.06612v2 [cs.LG] UPDATED)
    The recent surge of interest surrounding Multimodal Neural Networks (MM-NN) is attributed to their ability to effectively process and integrate multiscale information from diverse data sources. MM-NNs extract and fuse features from multiple modalities using adequate unimodal backbones and specific fusion networks. Although this helps strengthen the multimodal information representation, designing such networks is labor-intensive. It requires tuning the architectural parameters of the unimodal backbones, choosing the fusing point, and selecting the operations for fusion. Furthermore, multimodality AI is emerging as a cutting-edge option in Internet of Things (IoT) systems where inference latency and energy consumption are critical metrics in addition to accuracy. In this paper, we propose Harmonic-NAS, a framework for the joint optimization of unimodal backbones and multimodal fusion networks with hardware awareness on resource-constrained devices. Harmonic-NAS involves a two-tier optimization approach for the unimodal backbone architectures and fusion strategy and operators. By incorporating the hardware dimension into the optimization, evaluation results on various devices and multimodal datasets have demonstrated the superiority of Harmonic-NAS over state-of-the-art approaches achieving up to 10.9% accuracy improvement, 1.91x latency reduction, and 2.14x energy efficiency gain.  ( 2 min )
    Learning Large-Scale MTP$_2$ Gaussian Graphical Models via Bridge-Block Decomposition. (arXiv:2309.13405v2 [cs.LG] UPDATED)
    This paper studies the problem of learning the large-scale Gaussian graphical models that are multivariate totally positive of order two ($\text{MTP}_2$). By introducing the concept of bridge, which commonly exists in large-scale sparse graphs, we show that the entire problem can be equivalently optimized through (1) several smaller-scaled sub-problems induced by a \emph{bridge-block decomposition} on the thresholded sample covariance graph and (2) a set of explicit solutions on entries corresponding to \emph{bridges}. From practical aspect, this simple and provable discipline can be applied to break down a large problem into small tractable ones, leading to enormous reduction on the computational complexity and substantial improvements for all existing algorithms. The synthetic and real-world experiments demonstrate that our proposed method presents a significant speed-up compared to the state-of-the-art benchmarks.  ( 2 min )
    Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy. (arXiv:1911.09307v2 [cs.LG] UPDATED)
    Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of the neighboring relationship between samples. In this work, we propose a general regularizer called \textbf{Patch-level Neighborhood Interpolation~(Pani)} that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived \textbf{Pani VAT} presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. The second derived \textbf{Pani MixUp} method extends the MixUp, and achieves superiority over MixUp and competitive performance over state-of-the-art variants of MixUp method with a significant advantage in computational efficiency. Extensive experiments have verified the effectiveness of our Pani approach in both supervised and semi-supervised settings.  ( 2 min )
    Delay-Aware Hierarchical Federated Learning. (arXiv:2303.12414v4 [cs.LG] UPDATED)
    Federated learning has gained popularity as a means of training models distributed across the wireless edge. The paper introduces delay-aware hierarchical federated learning (DFL) to improve the efficiency of distributed machine learning (ML) model training by accounting for communication delays between edge and cloud. Different from traditional federated learning, DFL leverages multiple stochastic gradient descent iterations on local datasets within each global aggregation period and intermittently aggregates model parameters through edge servers in local subnetworks. During global synchronization, the cloud server consolidates local models with the outdated global model using a local-global combiner, thus preserving crucial elements of both, enhancing learning efficiency under the presence of delay. A set of conditions is obtained to achieve the sub-linear convergence rate of O(1/k) for strongly convex and smooth loss functions. Based on these findings, an adaptive control algorithm is developed for DFL, implementing policies to mitigate energy consumption and communication latency while aiming for sublinear convergence. Numerical evaluations show DFL's superior performance in terms of faster global model convergence, reduced resource consumption, and robustness against communication delays compared to existing FL algorithms. In summary, this proposed method offers improved efficiency and results when dealing with both convex and non-convex loss functions.  ( 2 min )
    On the Trade-offs between Adversarial Robustness and Actionable Explanations. (arXiv:2309.16452v1 [cs.LG])
    As machine learning models are increasingly being employed in various high-stakes settings, it becomes important to ensure that predictions of these models are not only adversarially robust, but also readily explainable to relevant stakeholders. However, it is unclear if these two notions can be simultaneously achieved or if there exist trade-offs between them. In this work, we make one of the first attempts at studying the impact of adversarially robust models on actionable explanations which provide end users with a means for recourse. We theoretically and empirically analyze the cost (ease of implementation) and validity (probability of obtaining a positive model prediction) of recourses output by state-of-the-art algorithms when the underlying models are adversarially robust vs. non-robust. More specifically, we derive theoretical bounds on the differences between the cost and the validity of the recourses generated by state-of-the-art algorithms for adversarially robust vs. non-robust linear and non-linear models. Our empirical results with multiple real-world datasets validate our theoretical results and show the impact of varying degrees of model robustness on the cost and validity of the resulting recourses. Our analyses demonstrate that adversarially robust models significantly increase the cost and reduce the validity of the resulting recourses, thus shedding light on the inherent trade-offs between adversarial robustness and actionable explanations  ( 2 min )
    Cross-Prediction-Powered Inference. (arXiv:2309.16598v1 [stat.ML])
    While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.  ( 2 min )
    Generalizable Heterogeneous Federated Cross-Correlation and Instance Similarity Learning. (arXiv:2309.16286v1 [cs.LG])
    Federated learning is an important privacy-preserving multi-party learning paradigm, involving collaborative learning with others and local updating on private data. Model heterogeneity and catastrophic forgetting are two crucial challenges, which greatly limit the applicability and generalizability. This paper presents a novel FCCL+, federated correlation and similarity learning with non-target distillation, facilitating the both intra-domain discriminability and inter-domain generalization. For heterogeneity issue, we leverage irrelevant unlabeled public data for communication between the heterogeneous participants. We construct cross-correlation matrix and align instance similarity distribution on both logits and feature levels, which effectively overcomes the communication barrier and improves the generalizable ability. For catastrophic forgetting in local updating stage, FCCL+ introduces Federated Non Target Distillation, which retains inter-domain knowledge while avoiding the optimization conflict issue, fulling distilling privileged inter-domain information through depicting posterior classes relation. Considering that there is no standard benchmark for evaluating existing heterogeneous federated learning under the same setting, we present a comprehensive benchmark with extensive representative methods under four domain shift scenarios, supporting both heterogeneous and homogeneous federated settings. Empirical results demonstrate the superiority of our method and the efficiency of modules on various scenarios.  ( 2 min )
    Language models in molecular discovery. (arXiv:2309.16235v1 [physics.chem-ph])
    The success of language models, especially transformer-based architectures, has trickled into other domains giving rise to "scientific language models" that operate on small molecules, proteins or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. Here, we review the role of language models in molecular discovery, underlining their strength in de novo drug design, property prediction and reaction chemistry. We highlight valuable open-source software assets thus lowering the entry barrier to the field of scientific language modeling. Last, we sketch a vision for future molecular design that combines a chatbot interface with access to computational chemistry tools. Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.  ( 2 min )
    Synthesizing Stable Reduced-Order Visuomotor Policies for Nonlinear Systems via Sums-of-Squares Optimization. (arXiv:2304.12405v2 [cs.RO] UPDATED)
    We present a method for synthesizing dynamic, reduced-order output-feedback polynomial control policies for control-affine nonlinear systems which guarantees runtime stability to a goal state, when using visual observations and a learned perception module in the feedback control loop. We leverage Lyapunov analysis to formulate the problem of synthesizing such policies. This problem is nonconvex in the policy parameters and the Lyapunov function that is used to prove the stability of the policy. To solve this problem approximately, we propose two approaches: the first solves a sequence of sum-of-squares optimization problems to iteratively improve a policy which is provably-stable by construction, while the second directly performs gradient-based optimization on the parameters of the polynomial policy, and its closed-loop stability is verified a posteriori. We extend our approach to provide stability guarantees in the presence of observation noise, which realistically arises due to errors in the learned perception module. We evaluate our approach on several underactuated nonlinear systems, including pendula and quadrotors, showing that our guarantees translate to empirical stability when controlling these systems from images, while baseline approaches can fail to reliably stabilize the system.
    Probabilistic Invariant Learning with Randomized Linear Classifiers. (arXiv:2308.04412v2 [cs.LG] UPDATED)
    Designing models that are both expressive and preserve known invariances of tasks is an increasingly hard problem. Existing solutions tradeoff invariance for computational or memory resources. In this work, we show how to leverage randomness and design models that are both expressive and invariant but use less resources. Inspired by randomized algorithms, our key insight is that accepting probabilistic notions of universal approximation and invariance can reduce our resource requirements. More specifically, we propose a class of binary classification models called Randomized Linear Classifiers (RLCs). We give parameter and sample size conditions in which RLCs can, with high probability, approximate any (smooth) function while preserving invariance to compact group transformations. Leveraging this result, we design three RLCs that are provably probabilistic invariant for classification tasks over sets, graphs, and spherical data. We show how these models can achieve probabilistic invariance and universality using less resources than (deterministic) neural networks and their invariant counterparts. Finally, we empirically demonstrate the benefits of this new class of models on invariant tasks where deterministic invariant neural networks are known to struggle.
    Set Learning for Accurate and Calibrated Models. (arXiv:2307.02245v3 [cs.LG] UPDATED)
    Model overconfidence and poor calibration are common in machine learning and difficult to account for when applying standard empirical risk minimization. In this work, we propose a novel method to alleviate these problems that we call odd-$k$-out learning (OKO), which minimizes the cross-entropy error for sets rather than for single examples. This naturally allows the model to capture correlations across data examples and achieves both better accuracy and calibration, especially in limited training data and class-imbalanced regimes. Perhaps surprisingly, OKO often yields better calibration even when training with hard labels and dropping any additional calibration parameter tuning, such as temperature scaling. We provide theoretical justification, establishing that OKO naturally yields better calibration, and provide extensive experimental analyses that corroborate our theoretical findings. We emphasize that OKO is a general framework that can be easily adapted to many settings and the trained model can be applied to single examples at inference time, without introducing significant run-time overhead or architecture changes.
    Bringing the Discussion of Minima Sharpness to the Audio Domain: a Filter-Normalised Evaluation for Acoustic Scene Classification. (arXiv:2309.16369v1 [cs.SD])
    The correlation between the sharpness of loss minima and generalisation in the context of deep neural networks has been subject to discussion for a long time. Whilst mostly investigated in the context of selected benchmark data sets in the area of computer vision, we explore this aspect for the audio scene classification task of the DCASE2020 challenge data. Our analysis is based on twodimensional filter-normalised visualisations and a derived sharpness measure. Our exploratory analysis shows that sharper minima tend to show better generalisation than flat minima -even more so for out-of-domain data, recorded from previously unseen devices-, thus adding to the dispute about better generalisation capabilities of flat minima. We further find that, in particular, the choice of optimisers is a main driver of the sharpness of minima and we discuss resulting limitations with respect to comparability. Our code, trained model states and loss landscape visualisations are publicly available.
    Attribute Graph Clustering via Learnable Augmentation. (arXiv:2212.03559v2 [cs.LG] UPDATED)
    Contrastive deep graph clustering (CDGC) utilizes contrastive learning to group nodes into different clusters. Better augmentation techniques benefit the quality of the contrastive samples, thus being one of key factors to improve performance. However, the augmentation samples in existing methods are always predefined by human experiences, and agnostic from the downstream task clustering, thus leading to high human resource costs and poor performance. To this end, we propose an Attribute Graph Clustering method via Learnable Augmentation (\textbf{AGCLA}), which introduces learnable augmentors for high-quality and suitable augmented samples for CDGC. Specifically, we design two learnable augmentors for attribute and structure information, respectively. Besides, two refinement matrices, including the high-confidence pseudo-label matrix and the cross-view sample similarity matrix, are generated to improve the reliability of the learned affinity matrix. During the training procedure, we notice that there exist differences between the optimization goals for training learnable augmentors and contrastive learning networks. In other words, we should both guarantee the consistency of the embeddings as well as the diversity of the augmented samples. Thus, an adversarial learning mechanism is designed in our method. Moreover, a two-stage training strategy is leveraged for the high-confidence refinement matrices. Extensive experimental results demonstrate the effectiveness of AGCLA on six benchmark datasets.
    DynaBench: A benchmark dataset for learning dynamical systems from low-resolution data. (arXiv:2306.05805v2 [cs.LG] UPDATED)
    Previous work on learning physical systems from data has focused on high-resolution grid-structured measurements. However, real-world knowledge of such systems (e.g. weather data) relies on sparsely scattered measuring stations. In this paper, we introduce a novel simulated benchmark dataset, DynaBench, for learning dynamical systems directly from sparsely scattered data without prior knowledge of the equations. The dataset focuses on predicting the evolution of a dynamical system from low-resolution, unstructured measurements. We simulate six different partial differential equations covering a variety of physical systems commonly used in the literature and evaluate several machine learning models, including traditional graph neural networks and point cloud processing models, with the task of predicting the evolution of the system. The proposed benchmark dataset is expected to advance the state of art as an out-of-the-box easy-to-use tool for evaluating models in a setting where only unstructured low-resolution observations are available. The benchmark is available at https://anonymous.4open.science/r/code-2022-dynabench/.
    Vertical Federated Learning: Concepts, Advances and Challenges. (arXiv:2211.12814v4 [cs.LG] UPDATED)
    Vertical Federated Learning (VFL) is a federated learning setting where multiple parties with different features about the same set of users jointly train machine learning models without exposing their raw data or model parameters. Motivated by the rapid growth in VFL research and real-world applications, we provide a comprehensive review of the concept and algorithms of VFL, as well as current advances and challenges in various aspects, including effectiveness, efficiency, and privacy. We provide an exhaustive categorization for VFL settings and privacy-preserving protocols and comprehensively analyze the privacy attacks and defense strategies for each protocol. In the end, we propose a unified framework, termed VFLow, which considers the VFL problem under communication, computation, privacy, as well as effectiveness and fairness constraints. Finally, we review the most recent advances in industrial applications, highlighting open challenges and future directions for VFL.
    The Devil is in the Details: A Deep Dive into the Rabbit Hole of Data Filtering. (arXiv:2309.15954v1 [cs.CV])
    The quality of pre-training data plays a critical role in the performance of foundation models. Popular foundation models often design their own recipe for data filtering, which makes it hard to analyze and compare different data filtering approaches. DataComp is a new benchmark dedicated to evaluating different methods for data filtering. This paper describes our learning and solution when participating in the DataComp challenge. Our filtering strategy includes three stages: single-modality filtering, cross-modality filtering, and data distribution alignment. We integrate existing methods and propose new solutions, such as computing CLIP score on horizontally flipped images to mitigate the interference of scene text, using vision and language models to retrieve training samples for target downstream tasks, rebalancing the data distribution to improve the efficiency of allocating the computational budget, etc. We slice and dice our design choices, provide in-depth analysis, and discuss open questions. Our approach outperforms the best method from the DataComp paper by over 4% on the average performance of 38 tasks and by over 2% on ImageNet.
    Machine Learning Based Analytics for the Significance of Gait Analysis in Monitoring and Managing Lower Extremity Injuries. (arXiv:2309.15990v1 [cs.LG])
    This study explored the potential of gait analysis as a tool for assessing post-injury complications, e.g., infection, malunion, or hardware irritation, in patients with lower extremity fractures. The research focused on the proficiency of supervised machine learning models predicting complications using consecutive gait datasets. We identified patients with lower extremity fractures at an academic center. Patients underwent gait analysis with a chest-mounted IMU device. Using software, raw gait data was preprocessed, emphasizing 12 essential gait variables. Machine learning models including XGBoost, Logistic Regression, SVM, LightGBM, and Random Forest were trained, tested, and evaluated. Attention was given to class imbalance, addressed using SMOTE. We introduced a methodology to compute the Rate of Change (ROC) for gait variables, independent of the time difference between gait analyses. XGBoost was the optimal model both before and after applying SMOTE. Prior to SMOTE, the model achieved an average test AUC of 0.90 (95% CI: [0.79, 1.00]) and test accuracy of 86% (95% CI: [75%, 97%]). Feature importance analysis attributed importance to the duration between injury and gait analysis. Data patterns showed early physiological compensations, followed by stabilization phases, emphasizing prompt gait analysis. This study underscores the potential of machine learning, particularly XGBoost, in gait analysis for orthopedic care. Predicting post-injury complications, early gait assessment becomes vital, revealing intervention points. The findings support a shift in orthopedics towards a data-informed approach, enhancing patient outcomes.
    Neuro-Inspired Hierarchical Multimodal Learning. (arXiv:2309.15877v1 [cs.LG])
    Integrating and processing information from various sources or modalities are critical for obtaining a comprehensive and accurate perception of the real world. Drawing inspiration from neuroscience, we develop the Information-Theoretic Hierarchical Perception (ITHP) model, which utilizes the concept of information bottleneck. Distinct from most traditional fusion models that aim to incorporate all modalities as input, our model designates the prime modality as input, while the remaining modalities act as detectors in the information pathway. Our proposed perception model focuses on constructing an effective and compact information flow by achieving a balance between the minimization of mutual information between the latent state and the input modal state, and the maximization of mutual information between the latent states and the remaining modal states. This approach leads to compact latent state representations that retain relevant information while minimizing redundancy, thereby substantially enhancing the performance of downstream tasks. Experimental evaluations on both the MUStARD and CMU-MOSI datasets demonstrate that our model consistently distills crucial information in multimodal learning scenarios, outperforming state-of-the-art benchmarks.
    Distill to Delete: Unlearning in Graph Networks with Knowledge Distillation. (arXiv:2309.16173v1 [cs.LG])
    Graph unlearning has emerged as a pivotal method to delete information from a pre-trained graph neural network (GNN). One may delete nodes, a class of nodes, edges, or a class of edges. An unlearning method enables the GNN model to comply with data protection regulations (i.e., the right to be forgotten), adapt to evolving data distributions, and reduce the GPU-hours carbon footprint by avoiding repetitive retraining. Existing partitioning and aggregation-based methods have limitations due to their poor handling of local graph dependencies and additional overhead costs. More recently, GNNDelete offered a model-agnostic approach that alleviates some of these issues. Our work takes a novel approach to address these challenges in graph unlearning through knowledge distillation, as it distills to delete in GNN (D2DGN). It is a model-agnostic distillation framework where the complete graph knowledge is divided and marked for retention and deletion. It performs distillation with response-based soft targets and feature-based node embedding while minimizing KL divergence. The unlearned model effectively removes the influence of deleted graph elements while preserving knowledge about the retained graph elements. D2DGN surpasses the performance of existing methods when evaluated on various real-world graph datasets by up to $43.1\%$ (AUC) in edge and node unlearning tasks. Other notable advantages include better efficiency, better performance in removing target elements, preservation of performance for the retained elements, and zero overhead costs. Notably, our D2DGN surpasses the state-of-the-art GNNDelete in AUC by $2.4\%$, improves membership inference ratio by $+1.3$, requires $10.2\times10^6$ fewer FLOPs per forward pass and up to $\mathbf{3.2}\times$ faster.
    Can LLMs Effectively Leverage Structural Information for Graph Learning: When and Why. (arXiv:2309.16595v1 [cs.LG])
    This paper studies Large Language Models (LLMs) for structured data--particularly graphs--a crucial data modality that remains underexplored in the LLM literature. We aim to understand when and why the incorporation of structural information inherent in graph data can improve the prediction performance of LLMs on node classification tasks. To address the ``when'' question, we examine a variety of prompting methods for encoding structural information, in settings where textual node features are either rich or scarce. For the ``why'' questions, we probe into two potential contributing factors to the LLM performance: data leakage and homophily. Our exploration of these questions reveals that (i) LLMs can benefit from structural information, especially when textual node features are scarce; (ii) there is no substantial evidence indicating that the performance of LLMs is significantly attributed to data leakage; and (iii) the performance of LLMs on a target node is strongly positively related to the local homophily ratio of the node.
    MotionLM: Multi-Agent Motion Forecasting as Language Modeling. (arXiv:2309.16534v1 [cs.CV])
    Reliable forecasting of the future behavior of road agents is a critical component to safe planning in autonomous vehicles. Here, we represent continuous trajectories as sequences of discrete motion tokens and cast multi-agent motion prediction as a language modeling task over this domain. Our model, MotionLM, provides several advantages: First, it does not require anchors or explicit latent variable optimization to learn multimodal distributions. Instead, we leverage a single standard language modeling objective, maximizing the average log probability over sequence tokens. Second, our approach bypasses post-hoc interaction heuristics where individual agent trajectory generation is conducted prior to interactive scoring. Instead, MotionLM produces joint distributions over interactive agent futures in a single autoregressive decoding process. In addition, the model's sequential factorization enables temporally causal conditional rollouts. The proposed approach establishes new state-of-the-art performance for multi-agent motion prediction on the Waymo Open Motion Dataset, ranking 1st on the interactive challenge leaderboard.
    Tensor Factorization for Leveraging Cross-Modal Knowledge in Data-Constrained Infrared Object Detection. (arXiv:2309.16592v1 [cs.CV])
    The primary bottleneck towards obtaining good recognition performance in IR images is the lack of sufficient labeled training data, owing to the cost of acquiring such data. Realizing that object detection methods for the RGB modality are quite robust (at least for some commonplace classes, like person, car, etc.), thanks to the giant training sets that exist, in this work we seek to leverage cues from the RGB modality to scale object detectors to the IR modality, while preserving model performance in the RGB modality. At the core of our method, is a novel tensor decomposition method called TensorFact which splits the convolution kernels of a layer of a Convolutional Neural Network (CNN) into low-rank factor matrices, with fewer parameters than the original CNN. We first pretrain these factor matrices on the RGB modality, for which plenty of training data are assumed to exist and then augment only a few trainable parameters for training on the IR modality to avoid over-fitting, while encouraging them to capture complementary cues from those trained only on the RGB modality. We validate our approach empirically by first assessing how well our TensorFact decomposed network performs at the task of detecting objects in RGB images vis-a-vis the original network and then look at how well it adapts to IR images of the FLIR ADAS v1 dataset. For the latter, we train models under scenarios that pose challenges stemming from data paucity. From the experiments, we observe that: (i) TensorFact shows performance gains on RGB images; (ii) further, this pre-trained model, when fine-tuned, outperforms a standard state-of-the-art object detector on the FLIR ADAS v1 dataset by about 4% in terms of mAP 50 score.
    Towards Poisoning Fair Representations. (arXiv:2309.16487v1 [cs.LG])
    Fair machine learning seeks to mitigate model prediction bias against certain demographic subgroups such as elder and female. Recently, fair representation learning (FRL) trained by deep neural networks has demonstrated superior performance, whereby representations containing no demographic information are inferred from the data and then used as the input to classification or other downstream tasks. Despite the development of FRL methods, their vulnerability under data poisoning attack, a popular protocol to benchmark model robustness under adversarial scenarios, is under-explored. Data poisoning attacks have been developed for classical fair machine learning methods which incorporate fairness constraints into shallow-model classifiers. Nonetheless, these attacks fall short in FRL due to notably different fairness goals and model architectures. This work proposes the first data poisoning framework attacking FRL. We induce the model to output unfair representations that contain as much demographic information as possible by injecting carefully crafted poisoning samples into the training data. This attack entails a prohibitive bilevel optimization, wherefore an effective approximated solution is proposed. A theoretical analysis on the needed number of poisoning samples is derived and sheds light on defending against the attack. Experiments on benchmark fairness datasets and state-of-the-art fair representation learning models demonstrate the superiority of our attack.
    Differentially Private Secure Multiplication: Hiding Information in the Rubble of Noise. (arXiv:2309.16105v1 [cs.IT])
    We consider the problem of private distributed multi-party multiplication. It is well-established that Shamir secret-sharing coding strategies can enable perfect information-theoretic privacy in distributed computation via the celebrated algorithm of Ben Or, Goldwasser and Wigderson (the "BGW algorithm"). However, perfect privacy and accuracy require an honest majority, that is, $N \geq 2t+1$ compute nodes are required to ensure privacy against any $t$ colluding adversarial nodes. By allowing for some controlled amount of information leakage and approximate multiplication instead of exact multiplication, we study coding schemes for the setting where the number of honest nodes can be a minority, that is $N< 2t+1.$ We develop a tight characterization privacy-accuracy trade-off for cases where $N < 2t+1$ by measuring information leakage using {differential} privacy instead of perfect privacy, and using the mean squared error metric for accuracy. A novel technical aspect is an intricately layered noise distribution that merges ideas from differential privacy and Shamir secret-sharing at different layers.  ( 2 min )
    Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption. (arXiv:2309.16409v1 [stat.ML])
    The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.  ( 2 min )
    LawBench: Benchmarking Legal Knowledge of Large Language Models. (arXiv:2309.16289v1 [cs.CL])
    Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.  ( 3 min )
    Stackelberg Batch Policy Learning. (arXiv:2309.16188v1 [stat.ML])
    Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.  ( 2 min )
    Improving Adaptive Online Learning Using Refined Discretization. (arXiv:2309.16044v1 [cs.LG])
    We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.  ( 2 min )
    Masked autoencoders are scalable learners of cellular morphology. (arXiv:2309.16064v1 [cs.CV])
    Inferring biological relationships from cellular phenotypes in high-content microscopy screens provides significant opportunity and challenge in biological research. Prior results have shown that deep vision models can capture biological signal better than hand-crafted features. This work explores how weakly supervised and self-supervised deep learning approaches scale when training larger models on larger datasets. Our results show that both CNN- and ViT-based masked autoencoders significantly outperform weakly supervised models. At the high-end of our scale, a ViT-L/8 trained on over 3.5-billion unique crops sampled from 95-million microscopy images achieves relative improvements as high as 28% over our best weakly supervised models at inferring known biological relationships curated from public databases.  ( 2 min )
    Deep Learning Based Uplink Multi-User SIMO Beamforming Design. (arXiv:2309.16603v1 [cs.IT])
    The advancement of fifth generation (5G) wireless communication networks has created a greater demand for wireless resource management solutions that offer high data rates, extensive coverage, minimal latency and energy-efficient performance. Nonetheless, traditional approaches have shortcomings when it comes to computational complexity and their ability to adapt to dynamic conditions, creating a gap between theoretical analysis and the practical execution of algorithmic solutions for managing wireless resources. Deep learning-based techniques offer promising solutions for bridging this gap with their substantial representation capabilities. We propose a novel unsupervised deep learning framework, which is called NNBF, for the design of uplink receive multi-user single input multiple output (MU-SIMO) beamforming. The primary objective is to enhance the throughput by focusing on maximizing the sum-rate while also offering computationally efficient solution, in contrast to established conventional methods. We conduct experiments for several antenna configurations. Our experimental results demonstrate that NNBF exhibits superior performance compared to our baseline methods, namely, zero-forcing beamforming (ZFBF) and minimum mean square error (MMSE) equalizer. Additionally, NNBF is scalable to the number of single-antenna user equipments (UEs) while baseline methods have significant computational burden due to matrix pseudo-inverse operation.
    Astroconformer: The Prospects of Analyzing Stellar Light Curves with Transformer-Based Deep Learning Models. (arXiv:2309.16316v1 [astro-ph.SR])
    Light curves of stars encapsulate a wealth of information about stellar oscillations and granulation, thereby offering key insights into the internal structure and evolutionary state of stars. Conventional asteroseismic techniques have been largely confined to power spectral analysis, neglecting the valuable phase information contained within light curves. While recent machine learning applications in asteroseismology utilizing Convolutional Neural Networks (CNNs) have successfully inferred stellar attributes from light curves, they are often limited by the local feature extraction inherent in convolutional operations. To circumvent these constraints, we present $\textit{Astroconformer}$, a Transformer-based deep learning framework designed to capture long-range dependencies in stellar light curves. Our empirical analysis, which focuses on estimating surface gravity ($\log g$), is grounded in a carefully curated dataset derived from $\textit{Kepler}$ light curves. These light curves feature asteroseismic $\log g$ values spanning from 0.2 to 4.4. Our results underscore that, in the regime where the training data is abundant, $\textit{Astroconformer}$ attains a root-mean-square-error (RMSE) of 0.017 dex around $\log g \approx 3 $. Even in regions where training data are sparse, the RMSE can reach 0.1 dex. It outperforms not only the K-nearest neighbor-based model ($\textit{The SWAN}$) but also state-of-the-art CNNs. Ablation studies confirm that the efficacy of the models in this particular task is strongly influenced by the size of their receptive fields, with larger receptive fields correlating with enhanced performance. Moreover, we find that the attention mechanisms within $\textit{Astroconformer}$ are well-aligned with the inherent characteristics of stellar oscillations and granulation present in the light curves.
    Towards Best Practices of Activation Patching in Language Models: Metrics and Methods. (arXiv:2309.16042v1 [cs.LG])
    Mechanistic interpretability seeks to understand the internal mechanisms of machine learning models, where localization -- identifying the important model components -- is a key step. Activation patching, also known as causal tracing or interchange intervention, is a standard technique for this task (Vig et al., 2020), but the literature contains many variants with little consensus on the choice of hyperparameters or methodology. In this work, we systematically examine the impact of methodological details in activation patching, including evaluation metrics and corruption methods. In several settings of localization and circuit discovery in language models, we find that varying these hyperparameters could lead to disparate interpretability results. Backed by empirical observations, we give conceptual arguments for why certain metrics or methods may be preferred. Finally, we provide recommendations for the best practices of activation patching going forwards.
    GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization. (arXiv:2309.16020v1 [cs.CV])
    Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. This task has considerable challenges due to immense variation in geographic landscapes. The image-to-image retrieval-based approaches fail to solve this problem on a global scale as it is not feasible to construct a large gallery of images covering the entire world. Instead, existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. However, their performance is limited by the predefined classes and often results in inaccurate localizations when an image's location significantly deviates from its class center. To overcome these limitations, we propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations. GeoCLIP's location encoder models the Earth as a continuous function by employing positional encoding through random Fourier features and constructing a hierarchical representation that captures information at varying resolutions to yield a semantically rich high-dimensional feature suitable to use even beyond geo-localization. To the best of our knowledge, this is the first work employing GPS encoding for geo-localization. We demonstrate the efficacy of our method via extensive experiments and ablations on benchmark datasets. We achieve competitive performance with just 20% of training data, highlighting its effectiveness even in limited-data settings. Furthermore, we qualitatively demonstrate geo-localization using a text query by leveraging CLIP backbone of our image encoder.
    Correcting for heterogeneity in real-time epidemiological indicators. (arXiv:2309.16546v1 [cs.LG])
    Auxiliary data sources have become increasingly important in epidemiological surveillance, as they are often available at a finer spatial and temporal resolution, larger coverage, and lower latency than traditional surveillance signals. We describe the problem of spatial and temporal heterogeneity in these signals derived from these data sources, where spatial and/or temporal biases are present. We present a method to use a ``guiding'' signal to correct for these biases and produce a more reliable signal that can be used for modeling and forecasting. The method assumes that the heterogeneity can be approximated by a low-rank matrix and that the temporal heterogeneity is smooth over time. We also present a hyperparameter selection algorithm to choose the parameters representing the matrix rank and degree of temporal smoothness of the corrections. In the absence of ground truth, we use maps and plots to argue that this method does indeed reduce heterogeneity. Reducing heterogeneity from auxiliary data sources greatly increases their utility in modeling and forecasting epidemics.
    High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality. (arXiv:2309.16476v1 [math.ST])
    We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    A Spectral Approach for Learning Spatiotemporal Neural Differential Equations. (arXiv:2309.16131v1 [cs.LG])
    Rapidly developing machine learning methods has stimulated research interest in computationally reconstructing differential equations (DEs) from observational data which may provide additional insight into underlying causative mechanisms. In this paper, we propose a novel neural-ODE based method that uses spectral expansions in space to learn spatiotemporal DEs. The major advantage of our spectral neural DE learning approach is that it does not rely on spatial discretization, thus allowing the target spatiotemporal equations to contain long range, nonlocal spatial interactions that act on unbounded spatial domains. Our spectral approach is shown to be as accurate as some of the latest machine learning approaches for learning PDEs operating on bounded domains. By developing a spectral framework for learning both PDEs and integro-differential equations, we extend machine learning methods to apply to unbounded DEs and a larger class of problems.
    AdvDiff: Generating Unrestricted Adversarial Examples using Diffusion Models. (arXiv:2307.12499v2 [cs.LG] UPDATED)
    Unrestricted adversarial attacks present a serious threat to deep learning models and adversarial defense techniques. They pose severe security problems for deep learning applications because they can effectively bypass defense mechanisms. However, previous attack methods often utilize Generative Adversarial Networks (GANs), which are not theoretically provable and thus generate unrealistic examples by incorporating adversarial objectives, especially for large-scale datasets like ImageNet. In this paper, we propose a new method, called AdvDiff, to generate unrestricted adversarial examples with diffusion models. We design two novel adversarial guidance techniques to conduct adversarial sampling in the reverse generation process of diffusion models. These two techniques are effective and stable to generate high-quality, realistic adversarial examples by integrating gradients of the target classifier interpretably. Experimental results on MNIST and ImageNet datasets demonstrate that AdvDiff is effective to generate unrestricted adversarial examples, which outperforms GAN-based methods in terms of attack performance and generation quality.
    Adversarial Examples Might be Avoidable: The Role of Data Concentration in Adversarial Robustness. (arXiv:2309.16096v1 [cs.LG])
    The susceptibility of modern machine learning classifiers to adversarial examples has motivated theoretical results suggesting that these might be unavoidable. However, these results can be too general to be applicable to natural data distributions. Indeed, humans are quite robust for tasks involving vision. This apparent conflict motivates a deeper dive into the question: Are adversarial examples truly unavoidable? In this work, we theoretically demonstrate that a key property of the data distribution -- concentration on small-volume subsets of the input space -- determines whether a robust classifier exists. We further demonstrate that, for a data distribution concentrated on a union of low-dimensional linear subspaces, exploiting data structure naturally leads to classifiers that enjoy good robustness guarantees, improving upon methods for provable certification in certain regimes.
    Infer and Adapt: Bipedal Locomotion Reward Learning from Demonstrations via Inverse Reinforcement Learning. (arXiv:2309.16074v1 [cs.RO])
    Enabling bipedal walking robots to learn how to maneuver over highly uneven, dynamically changing terrains is challenging due to the complexity of robot dynamics and interacted environments. Recent advancements in learning from demonstrations have shown promising results for robot learning in complex environments. While imitation learning of expert policies has been well-explored, the study of learning expert reward functions is largely under-explored in legged locomotion. This paper brings state-of-the-art Inverse Reinforcement Learning (IRL) techniques to solving bipedal locomotion problems over complex terrains. We propose algorithms for learning expert reward functions, and we subsequently analyze the learned functions. Through nonlinear function approximation, we uncover meaningful insights into the expert's locomotion strategies. Furthermore, we empirically demonstrate that training a bipedal locomotion policy with the inferred reward functions enhances its walking performance on unseen terrains, highlighting the adaptability offered by reward learning.
    VAE-based latent-space classification of RNO-G data. (arXiv:2309.16401v1 [astro-ph.HE])
    The Radio Neutrino Observatory in Greenland (RNO-G) is a radio-based ultra-high energy neutrino detector located at Summit Station, Greenland. It is still being constructed, with 7 stations currently operational. Neutrino detection works by measuring Askaryan radiation produced by neutrino-nucleon interactions. A neutrino candidate must be found amidst other backgrounds which are recorded at much higher rates -- including cosmic-rays and anthropogenic noise -- the origins of which are sometimes unknown. Here we describe a method to classify different noise classes using the latent space of a variational autoencoder. The latent space forms a compact representation that makes classification tractable. We analyze data from a noisy and a silent station. The method automatically detects and allows us to qualitatively separate multiple event classes, including physical wind-induced signals, for both the noisy and the quiet station.
    Contrastive Learning of Temporal Distinctiveness for Survival Analysis in Electronic Health Records. (arXiv:2308.13104v2 [cs.LG] UPDATED)
    Survival analysis plays a crucial role in many healthcare decisions, where the risk prediction for the events of interest can support an informative outlook for a patient's medical journey. Given the existence of data censoring, an effective way of survival analysis is to enforce the pairwise temporal concordance between censored and observed data, aiming to utilize the time interval before censoring as partially observed time-to-event labels for supervised learning. Although existing studies mostly employed ranking methods to pursue an ordering objective, contrastive methods which learn a discriminative embedding by having data contrast against each other, have not been explored thoroughly for survival analysis. Therefore, in this paper, we propose a novel Ontology-aware Temporality-based Contrastive Survival (OTCSurv) analysis framework that utilizes survival durations from both censored and observed data to define temporal distinctiveness and construct negative sample pairs with adjustable hardness for contrastive learning. Specifically, we first use an ontological encoder and a sequential self-attention encoder to represent the longitudinal EHR data with rich contexts. Second, we design a temporal contrastive loss to capture varying survival durations in a supervised setting through a hardness-aware negative sampling mechanism. Last, we incorporate the contrastive task into the time-to-event predictive task with multiple loss components. We conduct extensive experiments using a large EHR dataset to forecast the risk of hospitalized patients who are in danger of developing acute kidney injury (AKI), a critical and urgent medical condition. The effectiveness and explainability of the proposed model are validated through comprehensive quantitative and qualitative studies.
    Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation. (arXiv:2309.16429v1 [cs.LG])
    We consider the task of generating diverse and realistic videos guided by natural audio samples from a wide variety of semantic classes. For this task, the videos are required to be aligned both globally and temporally with the input audio: globally, the input audio is semantically associated with the entire output video, and temporally, each segment of the input audio is associated with a corresponding segment of that video. We utilize an existing text-conditioned video generation model and a pre-trained audio encoder model. The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model. As such, it also enables video generation conditioned on text, audio, and, for the first time as far as we can ascertain, on both text and audio. We validate our method extensively on three datasets demonstrating significant semantic diversity of audio-video samples and further propose a novel evaluation metric (AV-Align) to assess the alignment of generated videos with input audio samples. AV-Align is based on the detection and comparison of energy peaks in both modalities. In comparison to recent state-of-the-art approaches, our method generates videos that are better aligned with the input sound, both with respect to content and temporal axis. We also show that videos produced by our method present higher visual quality and are more diverse.
    Graph-level Representation Learning with Joint-Embedding Predictive Architectures. (arXiv:2309.16014v1 [cs.LG])
    Joint-Embedding Predictive Architectures (JEPAs) have recently emerged as a novel and powerful technique for self-supervised representation learning. They aim to learn an energy-based model by predicting the latent representation of a target signal $y$ from a context signal $x$. JEPAs bypass the need for data augmentation and negative samples, which are typically required by contrastive learning, while avoiding the overfitting issues associated with generative-based pretraining. In this paper, we show that graph-level representations can be effectively modeled using this paradigm and propose Graph-JEPA, the first JEPA for the graph domain. In particular, we employ masked modeling to learn embeddings for different subgraphs of the input graph. To endow the representations with the implicit hierarchy that is often present in graph-level concepts, we devise an alternative training objective that consists of predicting the coordinates of the encoded subgraphs on the unit hyperbola in the 2D plane. Extensive validation shows that Graph-JEPA can learn representations that are expressive and competitive in both graph classification and regression problems.
    Causal Policy Gradient for Whole-Body Mobile Manipulation. (arXiv:2305.04866v4 [cs.RO] UPDATED)
    Developing the next generation of household robot helpers requires combining locomotion and interaction capabilities, which is generally referred to as mobile manipulation (MoMa). MoMa tasks are difficult due to the large action space of the robot and the common multi-objective nature of the task, e.g., efficiently reaching a goal while avoiding obstacles. Current approaches often segregate tasks into navigation without manipulation and stationary manipulation without locomotion by manually matching parts of the action space to MoMa sub-objectives (e.g. learning base actions for locomotion objectives and learning arm actions for manipulation). This solution prevents simultaneous combinations of locomotion and interaction degrees of freedom and requires human domain knowledge for both partitioning the action space and matching the action parts to the sub-objectives. In this paper, we introduce Causal MoMa, a new reinforcement learning framework to train policies for typical MoMa tasks that makes use of the most favorable subspace of the robot's action space to address each sub-objective. Causal MoMa automatically discovers the causal dependencies between actions and terms of the reward function and exploits these dependencies through causal policy gradient that reduces gradient variance compared to previous state-of-the-art reinforcement learning algorithms, improving convergence and results. We evaluate the performance of Causal MoMa on three types of simulated robots across different MoMa tasks and demonstrate success in transferring the policies trained in simulation directly to a real robot, where our agent is able to follow moving goals and react to dynamic obstacles while simultaneously and synergistically controlling the whole-body: base, arm, and head. More information at https://sites.google.com/view/causal-moma.
    Tiny Classifier Circuits: Evolving Accelerators for Tabular Data. (arXiv:2303.00031v2 [cs.AR] UPDATED)
    A typical machine learning (ML) development cycle for edge computing is to maximise the performance during model training and then minimise the memory/area footprint of the trained model for deployment on edge devices targeting CPUs, GPUs, microcontrollers, or custom hardware accelerators. This paper proposes a methodology for automatically generating predictor circuits for classification of tabular data with comparable prediction performance to conventional ML techniques while using substantially fewer hardware resources and power. The proposed methodology uses an evolutionary algorithm to search over the space of logic gates and automatically generates a classifier circuit with maximised training prediction accuracy. Classifier circuits are so tiny (i.e., consisting of no more than 300 logic gates) that they are called "Tiny Classifier" circuits, and can efficiently be implemented in ASIC or on an FPGA. We empirically evaluate the automatic Tiny Classifier circuit generation methodology or "Auto Tiny Classifiers" on a wide range of tabular datasets, and compare it against conventional ML techniques such as Amazon's AutoGluon, Google's TabNet and a neural search over Multi-Layer Perceptrons. Despite Tiny Classifiers being constrained to a few hundred logic gates, we observe no statistically significant difference in prediction performance in comparison to the best-performing ML baseline. When synthesised as a Silicon chip, Tiny Classifiers use 8-18x less area and 4-8x less power. When implemented as an ultra-low cost chip on a flexible substrate (i.e., FlexIC), they occupy 10-75x less area and consume 13-75x less power compared to the most hardware-efficient ML baseline. On an FPGA, Tiny Classifiers consume 3-11x fewer resources.
    General In-Hand Object Rotation with Vision and Touch. (arXiv:2309.09979v2 [cs.RO] UPDATED)
    We introduce RotateIt, a system that enables fingertip-based object rotation along multiple axes by leveraging multimodal sensory inputs. Our system is trained in simulation, where it has access to ground-truth object shapes and physical properties. Then we distill it to operate on realistic yet noisy simulated visuotactile and proprioceptive sensory inputs. These multimodal inputs are fused via a visuotactile transformer, enabling online inference of object shapes and physical properties during deployment. We show significant performance improvements over prior methods and the importance of visual and tactile sensing.
    Enhancing Speech Articulation Analysis using a Geometric Transformation of the X-ray Microbeam Dataset. (arXiv:2305.10775v3 [eess.AS] UPDATED)
    Accurate analysis of speech articulation is crucial for speech analysis. However, X-Y coordinates of articulators strongly depend on the anatomy of the speakers and the variability of pellet placements, and existing methods for mapping anatomical landmarks in the X-ray Microbeam Dataset (XRMB) fail to capture the entire anatomy of the vocal tract. In this paper, we propose a new geometric transformation that improves the accuracy of these measurements. Our transformation maps anatomical landmarks' X-Y coordinates along the midsagittal plane onto six relative measures: Lip Aperture (LA), Lip Protusion (LP), Tongue Body Constriction Location (TTCL), Degree (TBCD), Tongue Tip Constriction Location (TTCL) and Degree (TTCD). Our novel contribution is the extension of the palate trace towards the inferred anterior pharyngeal line, which improves measurements of tongue body constriction.
    Efficiency Separation between RL Methods: Model-Free, Model-Based and Goal-Conditioned. (arXiv:2309.16291v1 [cs.LG])
    We prove a fundamental limitation on the efficiency of a wide class of Reinforcement Learning (RL) algorithms. This limitation applies to model-free RL methods as well as a broad range of model-based methods, such as planning with tree search. Under an abstract definition of this class, we provide a family of RL problems for which these methods suffer a lower bound exponential in the horizon for their interactions with the environment to find an optimal behavior. However, there exists a method, not tailored to this specific family of problems, which can efficiently solve the problems in the family. In contrast, our limitation does not apply to several types of methods proposed in the literature, for instance, goal-conditioned methods or other algorithms that construct an inverse dynamics model.  ( 2 min )
    Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics. (arXiv:2309.16109v1 [cs.LG])
    Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    Multi-Swap $k$-Means++. (arXiv:2309.16384v1 [cs.CG])
    The $k$-means++ algorithm of Arthur and Vassilvitskii (SODA 2007) is often the practitioners' choice algorithm for optimizing the popular $k$-means clustering objective and is known to give an $O(\log k)$-approximation in expectation. To obtain higher quality solutions, Lattanzi and Sohler (ICML 2019) proposed augmenting $k$-means++ with $O(k \log \log k)$ local search steps obtained through the $k$-means++ sampling distribution to yield a $c$-approximation to the $k$-means clustering problem, where $c$ is a large absolute constant. Here we generalize and extend their local search algorithm by considering larger and more sophisticated local search neighborhoods hence allowing to swap multiple centers at the same time. Our algorithm achieves a $9 + \varepsilon$ approximation ratio, which is the best possible for local search. Importantly we show that our approach yields substantial practical improvements, we show significant quality improvements over the approach of Lattanzi and Sohler (ICML 2019) on several datasets.
    A parsimonious, computationally efficient machine learning method for spatial regression. (arXiv:2309.16448v1 [stat.ML])
    We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.
    A Metaheuristic for Amortized Search in High-Dimensional Parameter Spaces. (arXiv:2309.16465v1 [q-bio.QM])
    Parameter inference for dynamical models of (bio)physical systems remains a challenging problem. Intractable gradients, high-dimensional spaces, and non-linear model functions are typically problematic without large computational budgets. A recent body of work in that area has focused on Bayesian inference methods, which consider parameters under their statistical distributions and therefore, do not derive point estimates of optimal parameter values. Here we propose a new metaheuristic that drives dimensionality reductions from feature-informed transformations (DR-FFIT) to address these bottlenecks. DR-FFIT implements an efficient sampling strategy that facilitates a gradient-free parameter search in high-dimensional spaces. We use artificial neural networks to obtain differentiable proxies for the model's features of interest. The resulting gradients enable the estimation of a local active subspace of the model within a defined sampling region. This approach enables efficient dimensionality reductions of highly non-linear search spaces at a low computational cost. Our test data show that DR-FFIT boosts the performances of random-search and simulated-annealing against well-established metaheuristics, and improves the goodness-of-fit of the model, all within contained run-time costs.
    Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation. (arXiv:2309.15938v1 [eess.AS])
    In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios. MC-SimCLR learns joint spectral and spatial representations from unlabeled spatial audios, thereby enhancing both event classification and sound localization in downstream tasks. At its core, we propose a multi-level data augmentation pipeline that augments different levels of audio features, including waveforms, Mel spectrograms, and generalized cross-correlation (GCC) features. In addition, we introduce simple yet effective channel-wise augmentation methods to randomly swap the order of the microphones and mask Mel and GCC channels. By using these augmentations, we find that linear layers on top of the learned representation significantly outperform supervised models in terms of both event classification accuracy and localization error. We also perform a comprehensive analysis of the effect of each augmentation method and a comparison of the fine-tuning performance using different amounts of labeled data.
    Learning to Transform for Generalizable Instance-wise Invariance. (arXiv:2309.16672v1 [cs.CV])
    Computer vision research has long aimed to build systems that are robust to spatial transformations found in natural data. Traditionally, this is done using data augmentation or hard-coding invariances into the architecture. However, too much or too little invariance can hurt, and the correct amount is unknown a priori and dependent on the instance. Ideally, the appropriate invariance would be learned from data and inferred at test-time. We treat invariance as a prediction problem. Given any image, we use a normalizing flow to predict a distribution over transformations and average the predictions over them. Since this distribution only depends on the instance, we can align instances before classifying them and generalize invariance across classes. The same distribution can also be used to adapt to out-of-distribution poses. This normalizing flow is trained end-to-end and can learn a much larger range of transformations than Augerino and InstaAug. When used as data augmentation, our method shows accuracy and robustness gains on CIFAR 10, CIFAR10-LT, and TinyImageNet.
    TraCE: Trajectory Counterfactual Explanation Scores. (arXiv:2309.15965v1 [cs.LG])
    Counterfactual explanations, and their associated algorithmic recourse, are typically leveraged to understand, explain, and potentially alter a prediction coming from a black-box classifier. In this paper, we propose to extend the use of counterfactuals to evaluate progress in sequential decision making tasks. To this end, we introduce a model-agnostic modular framework, TraCE (Trajectory Counterfactual Explanation) scores, which is able to distill and condense progress in highly complex scenarios into a single value. We demonstrate TraCE's utility across domains by showcasing its main properties in two case studies spanning healthcare and climate change.
    AnyMAL: An Efficient and Scalable Any-Modality Augmented Language Model. (arXiv:2309.16058v1 [cs.LG])
    We present Any-Modality Augmented Language Model (AnyMAL), a unified model that reasons over diverse input modality signals (i.e. text, image, video, audio, IMU motion sensor), and generates textual responses. AnyMAL inherits the powerful text-based reasoning abilities of the state-of-the-art LLMs including LLaMA-2 (70B), and converts modality-specific signals to the joint textual space through a pre-trained aligner module. To further strengthen the multimodal LLM's capabilities, we fine-tune the model with a multimodal instruction set manually collected to cover diverse topics and tasks beyond simple QAs. We conduct comprehensive empirical analysis comprising both human and automatic evaluations, and demonstrate state-of-the-art performance on various multimodal tasks.
    Uncertainty-Aware Decision Transformer for Stochastic Driving Environments. (arXiv:2309.16397v1 [cs.LG])
    Offline Reinforcement Learning (RL) has emerged as a promising framework for learning policies without active interactions, making it especially appealing for autonomous driving tasks. Recent successes of Transformers inspire casting offline RL as sequence modeling, which performs well in long-horizon tasks. However, they are overly optimistic in stochastic environments with incorrect assumptions that the same goal can be consistently achieved by identical actions. In this paper, we introduce an UNcertainty-awaRE deciSion Transformer (UNREST) for planning in stochastic driving environments without introducing additional transition or complex generative models. Specifically, UNREST estimates state uncertainties by the conditional mutual information between transitions and returns, and segments sequences accordingly. Discovering the `uncertainty accumulation' and `temporal locality' properties of driving environments, UNREST replaces the global returns in decision transformers with less uncertain truncated returns, to learn from true outcomes of agent actions rather than environment transitions. We also dynamically evaluate environmental uncertainty during inference for cautious planning. Extensive experimental results demonstrate UNREST's superior performance in various driving scenarios and the power of our uncertainty estimation strategy.
    Axiomatic Aggregations of Abductive Explanations. (arXiv:2109.03890v5 [cs.LG] UPDATED)
    The recent criticisms of the robustness of post hoc model approximation explanation methods (like LIME and SHAP) have led to the rise of model-precise abductive explanations. For each data point, abductive explanations provide a minimal subset of features that are sufficient to generate the outcome. While theoretically sound and rigorous, abductive explanations suffer from a major issue -- there can be several valid abductive explanations for the same data point. In such cases, providing a single abductive explanation can be insufficient; on the other hand, providing all valid abductive explanations can be incomprehensible due to their size. In this work, we solve this issue by aggregating the many possible abductive explanations into feature importance scores. We propose three aggregation methods: two based on power indices from cooperative game theory and a third based on a well-known measure of causal strength. We characterize these three methods axiomatically, showing that each of them uniquely satisfies a set of desirable properties. We also evaluate them on multiple datasets and show that these explanations are robust to the attacks that fool SHAP and LIME.
    Learning Dissipative Neural Dynamical Systems. (arXiv:2309.16032v1 [cs.LG])
    Consider an unknown nonlinear dynamical system that is known to be dissipative. The objective of this paper is to learn a neural dynamical model that approximates this system, while preserving the dissipativity property in the model. In general, imposing dissipativity constraints during neural network training is a hard problem for which no known techniques exist. In this work, we address the problem of learning a dissipative neural dynamical system model in two stages. First, we learn an unconstrained neural dynamical model that closely approximates the system dynamics. Next, we derive sufficient conditions to perturb the weights of the neural dynamical model to ensure dissipativity, followed by perturbation of the biases to retain the fit of the model to the trajectories of the nonlinear system. We show that these two perturbation problems can be solved independently to obtain a neural dynamical model that is guaranteed to be dissipative while closely approximating the nonlinear system.
    Augmenting LLMs with Knowledge: A survey on hallucination prevention. (arXiv:2309.16459v1 [cs.CL])
    Large pre-trained language models have demonstrated their proficiency in storing factual knowledge within their parameters and achieving remarkable results when fine-tuned for downstream natural language processing tasks. Nonetheless, their capacity to access and manipulate knowledge with precision remains constrained, resulting in performance disparities on knowledge-intensive tasks when compared to task-specific architectures. Additionally, the challenges of providing provenance for model decisions and maintaining up-to-date world knowledge persist as open research frontiers. To address these limitations, the integration of pre-trained models with differentiable access mechanisms to explicit non-parametric memory emerges as a promising solution. This survey delves into the realm of language models (LMs) augmented with the ability to tap into external knowledge sources, including external knowledge bases and search engines. While adhering to the standard objective of predicting missing tokens, these augmented LMs leverage diverse, possibly non-parametric external modules to augment their contextual processing capabilities, departing from the conventional language modeling paradigm. Through an exploration of current advancements in augmenting large language models with knowledge, this work concludes that this emerging research direction holds the potential to address prevalent issues in traditional LMs, such as hallucinations, un-grounded responses, and scalability challenges.
    A Design Toolbox for the Development of Collaborative Distributed Machine Learning Systems. (arXiv:2309.16584v1 [cs.MA])
    To leverage training data for the sufficient training of ML models from multiple parties in a confidentiality-preserving way, various collaborative distributed machine learning (CDML) system designs have been developed, for example, to perform assisted learning, federated learning, and split learning. CDML system designs show different traits, for example, high agent autonomy, machine learning (ML) model confidentiality, and fault tolerance. Facing a wide variety of CDML system designs with different traits, it is difficult for developers to design CDML systems with traits that match use case requirements in a targeted way. However, inappropriate CDML system designs may result in CDML systems failing their envisioned purposes. We developed a CDML design toolbox that can guide the development of CDML systems. Based on the CDML design toolbox, we present CDML system archetypes with distinct key traits that can support the design of CDML systems to meet use case requirements.
    Two-Step Active Learning for Instance Segmentation with Uncertainty and Diversity Sampling. (arXiv:2309.16139v1 [cs.CV])
    Training high-quality instance segmentation models requires an abundance of labeled images with instance masks and classifications, which is often expensive to procure. Active learning addresses this challenge by striving for optimum performance with minimal labeling cost by selecting the most informative and representative images for labeling. Despite its potential, active learning has been less explored in instance segmentation compared to other tasks like image classification, which require less labeling. In this study, we propose a post-hoc active learning algorithm that integrates uncertainty-based sampling with diversity-based sampling. Our proposed algorithm is not only simple and easy to implement, but it also delivers superior performance on various datasets. Its practical application is demonstrated on a real-world overhead imagery dataset, where it increases the labeling efficiency fivefold.
    Robust Offline Reinforcement Learning -- Certify the Confidence Interval. (arXiv:2309.16631v1 [cs.LG])
    Currently, reinforcement learning (RL), especially deep RL, has received more and more attention in the research area. However, the security of RL has been an obvious problem due to the attack manners becoming mature. In order to defend against such adversarial attacks, several practical approaches are developed, such as adversarial training, data filtering, etc. However, these methods are mostly based on empirical algorithms and experiments, without rigorous theoretical analysis of the robustness of the algorithms. In this paper, we develop an algorithm to certify the robustness of a given policy offline with random smoothing, which could be proven and conducted as efficiently as ones without random smoothing. Experiments on different environments confirm the correctness of our algorithm.
    Infinite Neural Network Quantum States: Entanglement and Training Dynamics. (arXiv:2112.00723v2 [quant-ph] UPDATED)
    We study infinite limits of neural network quantum states ($\infty$-NNQS), which exhibit representation power through ensemble statistics, and also tractable gradient descent dynamics. Ensemble averages of Renyi entropies are expressed in terms of neural network correlators, and architectures that exhibit volume-law entanglement are presented. A general framework is developed for studying the gradient descent dynamics of neural network quantum states (NNQS), using a quantum state neural tangent kernel (QS-NTK). For $\infty$-NNQS the training dynamics is simplified, since the QS-NTK becomes deterministic and constant. An analytic solution is derived for quantum state supervised learning, which allows an $\infty$-NNQS to recover any target wavefunction. Numerical experiments on finite and infinite NNQS in the transverse field Ising model and Fermi Hubbard model demonstrate excellent agreement with theory. $\infty$-NNQS opens up new opportunities for studying entanglement and training dynamics in other physics applications, such as in finding ground states.
    Navigating Healthcare Insights: A Birds Eye View of Explainability with Knowledge Graphs. (arXiv:2309.16593v1 [cs.AI])
    Knowledge graphs (KGs) are gaining prominence in Healthcare AI, especially in drug discovery and pharmaceutical research as they provide a structured way to integrate diverse information sources, enhancing AI system interpretability. This interpretability is crucial in healthcare, where trust and transparency matter, and eXplainable AI (XAI) supports decision making for healthcare professionals. This overview summarizes recent literature on the impact of KGs in healthcare and their role in developing explainable AI models. We cover KG workflow, including construction, relationship extraction, reasoning, and their applications in areas like Drug-Drug Interactions (DDI), Drug Target Interactions (DTI), Drug Development (DD), Adverse Drug Reactions (ADR), and bioinformatics. We emphasize the importance of making KGs more interpretable through knowledge-infused learning in healthcare. Finally, we highlight research challenges and provide insights for future directions.
    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit. (arXiv:2309.16620v1 [stat.ML])
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    AtomSurf : Surface Representation for Learning on Protein Structures. (arXiv:2309.16519v1 [cs.LG])
    Recent advancements in Cryo-EM and protein structure prediction algorithms have made large-scale protein structures accessible, paving the way for machine learning-based functional annotations.The field of geometric deep learning focuses on creating methods working on geometric data. An essential aspect of learning from protein structures is representing these structures as a geometric object (be it a grid, graph, or surface) and applying a learning method tailored to this representation. The performance of a given approach will then depend on both the representation and its corresponding learning method. In this paper, we investigate representing proteins as $\textit{3D mesh surfaces}$ and incorporate them into an established representation benchmark. Our first finding is that despite promising preliminary results, the surface representation alone does not seem competitive with 3D grids. Building on this, we introduce a synergistic approach, combining surface representations with graph-based methods, resulting in a general framework that incorporates both representations in learning. We show that using this combination, we are able to obtain state-of-the-art results across $\textit{all tested tasks}$. Our code and data can be found online: https://github.com/Vincentx15/atom2D .
    Nonlinear MPC design for incrementally ISS systems with application to GRU networks. (arXiv:2309.16428v1 [eess.SY])
    This brief addresses the design of a Nonlinear Model Predictive Control (NMPC) strategy for exponentially incremental Input-to-State Stable (ISS) systems. In particular, a novel formulation is devised, which does not necessitate the onerous computation of terminal ingredients, but rather relies on the explicit definition of a minimum prediction horizon ensuring closed-loop stability. The designed methodology is particularly suited for the control of systems learned by Recurrent Neural Networks (RNNs), which are known for their enhanced modeling capabilities and for which the incremental ISS properties can be studied thanks to simple algebraic conditions. The approach is applied to Gated Recurrent Unit (GRU) networks, providing also a method for the design of a tailored state observer with convergence guarantees. The resulting control architecture is tested on a benchmark system, demonstrating its good control performances and efficient applicability.
    Uncertainty Quantification for Eosinophil Segmentation. (arXiv:2309.16536v1 [eess.IV])
    Eosinophilic Esophagitis (EoE) is an allergic condition increasing in prevalence. To diagnose EoE, pathologists must find 15 or more eosinophils within a single high-power field (400X magnification). Determining whether or not a patient has EoE can be an arduous process and any medical imaging approaches used to assist diagnosis must consider both efficiency and precision. We propose an improvement of Adorno et al's approach for quantifying eosinphils using deep image segmentation. Our new approach leverages Monte Carlo Dropout, a common approach in deep learning to reduce overfitting, to provide uncertainty quantification on current deep learning models. The uncertainty can be visualized in an output image to evaluate model performance, provide insight to how deep learning algorithms function, and assist pathologists in identifying eosinophils.
    Systematic Sampling and Validation of Machine Learning-Parameterizations in Climate Models. (arXiv:2309.16177v1 [physics.ao-ph])
    Progress in hybrid physics-machine learning (ML) climate simulations has been limited by the difficulty of obtaining performant coupled (i.e. online) simulations. While evaluating hundreds of ML parameterizations of subgrid closures (here of convection and radiation) offline is straightforward, online evaluation at the same scale is technically challenging. Our software automation achieves an order-of-magnitude larger sampling of online modeling errors than has previously been examined. Using this, we evaluate the hybrid climate model performance and define strategies to improve it. We show that model online performance improves when incorporating memory, a relative humidity input feature transformation, and additional input variables. We also reveal substantial variation in online error and inconsistencies between offline vs. online error statistics. The implication is that hundreds of candidate ML models should be evaluated online to detect the effects of parameterization design choices. This is considerably more sampling than tends to be reported in the current literature.
    Deep Single Models vs. Ensembles: Insights for a Fast Deployment of Parking Monitoring Systems. (arXiv:2309.16495v1 [cs.CV])
    Searching for available parking spots in high-density urban centers is a stressful task for drivers that can be mitigated by systems that know in advance the nearest parking space available. To this end, image-based systems offer cost advantages over other sensor-based alternatives (e.g., ultrasonic sensors), requiring less physical infrastructure for installation and maintenance. Despite recent deep learning advances, deploying intelligent parking monitoring is still a challenge since most approaches involve collecting and labeling large amounts of data, which is laborious and time-consuming. Our study aims to uncover the challenges in creating a global framework, trained using publicly available labeled parking lot images, that performs accurately across diverse scenarios, enabling the parking space monitoring as a ready-to-use system to deploy in a new environment. Through exhaustive experiments involving different datasets and deep learning architectures, including fusion strategies and ensemble methods, we found that models trained on diverse datasets can achieve 95\% accuracy without the burden of data annotation and model training on the target parking lot
    Dynamic Selection in Algorithmic Decision-making. (arXiv:2108.12547v3 [econ.EM] UPDATED)
    This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
    Voting Network for Contour Levee Farmland Segmentation and Classification. (arXiv:2309.16561v1 [cs.CV])
    High-resolution aerial imagery allows fine details in the segmentation of farmlands. However, small objects and features introduce distortions to the delineation of object boundaries, and larger contextual views are needed to mitigate class confusion. In this work, we present an end-to-end trainable network for segmenting farmlands with contour levees from high-resolution aerial imagery. A fusion block is devised that includes multiple voting blocks to achieve image segmentation and classification. We integrate the fusion block with a backbone and produce both semantic predictions and segmentation slices. The segmentation slices are used to perform majority voting on the predictions. The network is trained to assign the most likely class label of a segment to its pixels, learning the concept of farmlands rather than analyzing constitutive pixels separately. We evaluate our method using images from the National Agriculture Imagery Program. Our method achieved an average accuracy of 94.34\%. Compared to the state-of-the-art methods, the proposed method obtains an improvement of 6.96% and 2.63% in the F1 score on average.
    Selective Nonparametric Regression via Testing. (arXiv:2309.16412v1 [stat.ML])
    Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    Jointly Training Large Autoregressive Multimodal Models. (arXiv:2309.15564v2 [cs.LG] UPDATED)
    In recent years, advances in the large-scale pretraining of language and text-to-image models have revolutionized the field of machine learning. Yet, integrating these two modalities into a single, robust model capable of generating seamless multimodal outputs remains a significant challenge. To address this gap, we present the Joint Autoregressive Mixture (JAM) framework, a modular approach that systematically fuses existing text and image generation models. We also introduce a specialized, data-efficient instruction-tuning strategy, tailored for mixed-modal generation tasks. Our final instruct-tuned model demonstrates unparalleled performance in generating high-quality multimodal outputs and represents the first model explicitly designed for this purpose.
    CasIL: Cognizing and Imitating Skills via a Dual Cognition-Action Architecture. (arXiv:2309.16299v1 [cs.RO])
    Enabling robots to effectively imitate expert skills in longhorizon tasks such as locomotion, manipulation, and more, poses a long-standing challenge. Existing imitation learning (IL) approaches for robots still grapple with sub-optimal performance in complex tasks. In this paper, we consider how this challenge can be addressed within the human cognitive priors. Heuristically, we extend the usual notion of action to a dual Cognition (high-level)-Action (low-level) architecture by introducing intuitive human cognitive priors, and propose a novel skill IL framework through human-robot interaction, called Cognition-Action-based Skill Imitation Learning (CasIL), for the robotic agent to effectively cognize and imitate the critical skills from raw visual demonstrations. CasIL enables both cognition and action imitation, while high-level skill cognition explicitly guides low-level primitive actions, providing robustness and reliability to the entire skill IL process. We evaluated our method on MuJoCo and RLBench benchmarks, as well as on the obstacle avoidance and point-goal navigation tasks for quadrupedal robot locomotion. Experimental results show that our CasIL consistently achieves competitive and robust skill imitation capability compared to other counterparts in a variety of long-horizon robotic tasks.
    Method and Validation for Optimal Lineup Creation for Daily Fantasy Football Using Machine Learning and Linear Programming. (arXiv:2309.15253v2 [cs.LG] UPDATED)
    Daily fantasy sports (DFS) are weekly or daily online contests where real-game performances of individual players are converted to fantasy points (FPTS). Users select players for their lineup to maximize their FPTS within a set player salary cap. This paper focuses on (1) the development of a method to forecast NFL player performance under uncertainty and (2) determining an optimal lineup to maximize FPTS under a set salary limit. A supervised learning neural network was created and used to project FPTS based on past player performance (2018 NFL regular season for this work) prior to the upcoming week. These projected FPTS were used in a mixed integer linear program to find the optimal lineup. The performance of resultant lineups was compared to randomly-created lineups. On average, the optimal lineups outperformed the random lineups. The generated lineups were then compared to real-world lineups from users on DraftKings. The generated lineups generally fell in approximately the 31st percentile (median). The FPTS methods and predictions presented here can be further improved using this study as a baseline comparison.
    Leveraging Pre-trained Language Models for Time Interval Prediction in Text-Enhanced Temporal Knowledge Graphs. (arXiv:2309.16357v1 [cs.LG])
    Most knowledge graph completion (KGC) methods learn latent representations of entities and relations of a given graph by mapping them into a vector space. Although the majority of these methods focus on static knowledge graphs, a large number of publicly available KGs contain temporal information stating the time instant/period over which a certain fact has been true. Such graphs are often known as temporal knowledge graphs. Furthermore, knowledge graphs may also contain textual descriptions of entities and relations. Both temporal information and textual descriptions are not taken into account during representation learning by static KGC methods, and only structural information of the graph is leveraged. Recently, some studies have used temporal information to improve link prediction, yet they do not exploit textual descriptions and do not support inductive inference (prediction on entities that have not been seen in training). We propose a novel framework called TEMT that exploits the power of pre-trained language models (PLMs) for text-enhanced temporal knowledge graph completion. The knowledge stored in the parameters of a PLM allows TEMT to produce rich semantic representations of facts and to generalize on previously unseen entities. TEMT leverages textual and temporal information available in a KG, treats them separately, and fuses them to get plausibility scores of facts. Unlike previous approaches, TEMT effectively captures dependencies across different time points and enables predictions on unseen entities. To assess the performance of TEMT, we carried out several experiments including time interval prediction, both in transductive and inductive settings, and triple classification. The experimental results show that TEMT is competitive with the state-of-the-art.
    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes. (arXiv:2212.10538v2 [cs.LG] UPDATED)
    Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.
    Quantum Self-Attention Neural Networks for Text Classification. (arXiv:2205.05625v2 [quant-ph] UPDATED)
    An emerging direction of quantum computing is to establish meaningful quantum applications in various fields of artificial intelligence, including natural language processing (NLP). Although some efforts based on syntactic analysis have opened the door to research in Quantum NLP (QNLP), limitations such as heavy syntactic preprocessing and syntax-dependent network architecture make them impracticable on larger and real-world data sets. In this paper, we propose a new simple network architecture, called the quantum self-attention neural network (QSANN), which can compensate for these limitations. Specifically, we introduce the self-attention mechanism into quantum neural networks and then utilize a Gaussian projected quantum self-attention serving as a sensible quantum version of self-attention. As a result, QSANN is effective and scalable on larger data sets and has the desirable property of being implementable on near-term quantum devices. In particular, our QSANN outperforms the best existing QNLP model based on syntactic analysis as well as a simple classical self-attention neural network in numerical experiments of text classification tasks on public data sets. We further show that our method exhibits robustness to low-level quantum noises and showcases resilience to quantum neural network architectures.
    Augment to Interpret: Unsupervised and Inherently Interpretable Graph Embeddings. (arXiv:2309.16564v1 [cs.LG])
    Unsupervised learning allows us to leverage unlabelled data, which has become abundantly available, and to create embeddings that are usable on a variety of downstream tasks. However, the typical lack of interpretability of unsupervised representation learning has become a limiting factor with regard to recent transparent-AI regulations. In this paper, we study graph representation learning and we show that data augmentation that preserves semantics can be learned and used to produce interpretations. Our framework, which we named INGENIOUS, creates inherently interpretable embeddings and eliminates the need for costly additional post-hoc analysis. We also introduce additional metrics addressing the lack of formalism and metrics in the understudied area of unsupervised-representation learning interpretability. Our results are supported by an experimental study applied to both graph-level and node-level tasks and show that interpretable embeddings provide state-of-the-art performance on subsequent downstream tasks.
    Unsupervised Fact Verification by Language Model Distillation. (arXiv:2309.16540v1 [cs.CL])
    Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.
    Reusability report: Prostate cancer stratification with diverse biologically-informed neural architectures. (arXiv:2309.16645v1 [cs.LG])
    In, Elmarakeby et al., "Biologically informed deep neural network for prostate cancer discovery", a feedforward neural network with biologically informed, sparse connections (P-NET) was presented to model the state of prostate cancer. We verified the reproducibility of the study conducted by Elmarakeby et al., using both their original codebase, and our own re-implementation using more up-to-date libraries. We quantified the contribution of network sparsification by Reactome biological pathways, and confirmed its importance to P-NET's superior performance. Furthermore, we explored alternative neural architectures and approaches to incorporating biological information into the networks. We experimented with three types of graph neural networks on the same training data, and investigated the clinical prediction agreement between different models. Our analyses demonstrated that deep neural networks with distinct architectures make incorrect predictions for individual patient that are persistent across different initializations of a specific neural architecture. This suggests that different neural architectures are sensitive to different aspects of the data, an important yet under-explored challenge for clinical prediction tasks.
    Using Weak Supervision and Data Augmentation in Question Answering. (arXiv:2309.16175v1 [cs.CL])
    The onset of the COVID-19 pandemic accentuated the need for access to biomedical literature to answer timely and disease-specific questions. During the early days of the pandemic, one of the biggest challenges we faced was the lack of peer-reviewed biomedical articles on COVID-19 that could be used to train machine learning models for question answering (QA). In this paper, we explore the roles weak supervision and data augmentation play in training deep neural network QA models. First, we investigate whether labels generated automatically from the structured abstracts of scholarly papers using an information retrieval algorithm, BM25, provide a weak supervision signal to train an extractive QA model. We also curate new QA pairs using information retrieval techniques, guided by the clinicaltrials.gov schema and the structured abstracts of articles, in the absence of annotated data from biomedical domain experts. Furthermore, we explore augmenting the training data of a deep neural network model with linguistic features from external sources such as lexical databases to account for variations in word morphology and meaning. To better utilize our training data, we apply curriculum learning to domain adaptation, fine-tuning our QA model in stages based on characteristics of the QA pairs. We evaluate our methods in the context of QA models at the core of a system to answer questions about COVID-19.
    End-to-end Risk Prediction of Atrial Fibrillation from the 12-Lead ECG by Deep Neural Networks. (arXiv:2309.16335v1 [cs.LG])
    Background: Atrial fibrillation (AF) is one of the most common cardiac arrhythmias that affects millions of people each year worldwide and it is closely linked to increased risk of cardiovascular diseases such as stroke and heart failure. Machine learning methods have shown promising results in evaluating the risk of developing atrial fibrillation from the electrocardiogram. We aim to develop and evaluate one such algorithm on a large CODE dataset collected in Brazil. Results: The deep neural network model identified patients without indication of AF in the presented ECG but who will develop AF in the future with an AUC score of 0.845. From our survival model, we obtain that patients in the high-risk group (i.e. with the probability of a future AF case being greater than 0.7) are 50% more likely to develop AF within 40 weeks, while patients belonging to the minimal-risk group (i.e. with the probability of a future AF case being less than or equal to 0.1) have more than 85% chance of remaining AF free up until after seven years. Conclusion: We developed and validated a model for AF risk prediction. If applied in clinical practice, the model possesses the potential of providing valuable and useful information in decision-making and patient management processes.
    Universal Sleep Decoder: Aligning awake and sleep neural representation across subjects. (arXiv:2309.16457v1 [cs.LG])
    Decoding memory content from brain activity during sleep has long been a goal in neuroscience. While spontaneous reactivation of memories during sleep in rodents is known to support memory consolidation and offline learning, capturing memory replay in humans is challenging due to the absence of well-annotated sleep datasets and the substantial differences in neural patterns between wakefulness and sleep. To address these challenges, we designed a novel cognitive neuroscience experiment and collected a comprehensive, well-annotated electroencephalography (EEG) dataset from 52 subjects during both wakefulness and sleep. Leveraging this benchmark dataset, we developed the Universal Sleep Decoder (USD) to align neural representations between wakefulness and sleep across subjects. Our model achieves up to 16.6% top-1 zero-shot accuracy on unseen subjects, comparable to decoding performances using individual sleep data. Furthermore, fine-tuning USD on test subjects enhances decoding accuracy to 25.9% top-1 accuracy, a substantial improvement over the baseline chance of 6.7%. Model comparison and ablation analyses reveal that our design choices, including the use of (i) an additional contrastive objective to integrate awake and sleep neural signals and (ii) the pretrain-finetune paradigm to incorporate different subjects, significantly contribute to these performances. Collectively, our findings and methodologies represent a significant advancement in the field of sleep decoding.
    Comparing Active Learning Performance Driven by Gaussian Processes or Bayesian Neural Networks for Constrained Trajectory Exploration. (arXiv:2309.16114v1 [cs.RO])
    Robots with increasing autonomy progress our space exploration capabilities, particularly for in-situ exploration and sampling to stand in for human explorers. Currently, humans drive robots to meet scientific objectives, but depending on the robot's location, the exchange of information and driving commands between the human operator and robot may cause undue delays in mission fulfillment. An autonomous robot encoded with a scientific objective and an exploration strategy incurs no communication delays and can fulfill missions more quickly. Active learning algorithms offer this capability of intelligent exploration, but the underlying model structure varies the performance of the active learning algorithm in accurately forming an understanding of the environment. In this paper, we investigate the performance differences between active learning algorithms driven by Gaussian processes or Bayesian neural networks for exploration strategies encoded on agents that are constrained in their trajectories, like planetary surface rovers. These two active learning strategies were tested in a simulation environment against science-blind strategies to predict the spatial distribution of a variable of interest along multiple datasets. The performance metrics of interest are model accuracy in root mean squared (RMS) error, training time, model convergence, total distance traveled until convergence, and total samples until convergence. Active learning strategies encoded with Gaussian processes require less computation to train, converge to an accurate model more quickly, and propose trajectories of shorter distance, except in a few complex environments in which Bayesian neural networks achieve a more accurate model in the large data regime due to their more expressive functional bases. The paper concludes with advice on when and how to implement either exploration strategy for future space missions.
    Review of Machine Learning Methods for Additive Manufacturing of Functionally Graded Materials. (arXiv:2309.16571v1 [cs.LG])
    Additive manufacturing has revolutionized the manufacturing of complex parts by enabling direct material joining and offers several advantages such as cost-effective manufacturing of complex parts, reducing manufacturing waste, and opening new possibilities for manufacturing automation. One group of materials for which additive manufacturing holds great potential for enhancing component performance and properties is Functionally Graded Materials (FGMs). FGMs are advanced composite materials that exhibit smoothly varying properties making them desirable for applications in aerospace, automobile, biomedical, and defense industries. Such composition differs from traditional composite materials, since the location-dependent composition changes gradually in FGMs, leading to enhanced properties. Recently, machine learning techniques have emerged as a promising means for fabrication of FGMs through optimizing processing parameters, improving product quality, and detecting manufacturing defects. This paper first provides a brief literature review of works related to FGM fabrication, followed by reviewing works on employing machine learning in additive manufacturing, Afterward, we provide an overview of published works in the literature related to the application of machine learning methods in Directed Energy Deposition and for fabrication of FGMs.
    EFFL: Egalitarian Fairness in Federated Learning for Mitigating Matthew Effect. (arXiv:2309.16338v1 [cs.LG])
    Recent advances in federated learning (FL) enable collaborative training of machine learning (ML) models from large-scale and widely dispersed clients while protecting their privacy. However, when different clients' datasets are heterogeneous, traditional FL mechanisms produce a global model that does not adequately represent the poorer clients with limited data resources, resulting in lower accuracy and higher bias on their local data. According to the Matthew effect, which describes how the advantaged gain more advantage and the disadvantaged lose more over time, deploying such a global model in client applications may worsen the resource disparity among the clients and harm the principles of social welfare and fairness. To mitigate the Matthew effect, we propose Egalitarian Fairness Federated Learning (EFFL), where egalitarian fairness refers to the global model learned from FL has: (1) equal accuracy among clients; (2) equal decision bias among clients. Besides achieving egalitarian fairness among the clients, EFFL also aims for performance optimality, minimizing the empirical risk loss and the bias for each client; both are essential for any ML model training, whether centralized or decentralized. We formulate EFFL as a constrained multi-constrained multi-objectives optimization (MCMOO) problem, with the decision bias and egalitarian fairness as constraints and the minimization of the empirical risk losses on all clients as multiple objectives to be optimized. We propose a gradient-based three-stage algorithm to obtain the Pareto optimal solutions within the constraint space. Extensive experiments demonstrate that EFFL outperforms other state-of-the-art FL algorithms in achieving a high-performance global model with enhanced egalitarian fairness among all clients.
    Task-Oriented Koopman-Based Control with Contrastive Encoder. (arXiv:2309.16077v1 [cs.RO])
    We present task-oriented Koopman-based control that utilizes end-to-end reinforcement learning and contrastive encoder to simultaneously learn the Koopman latent embedding, operator and associated linear controller within an iterative loop. By prioritizing the task cost as main objective for controller learning, we reduce the reliance of controller design on a well-identified model, which extends Koopman control beyond low-dimensional systems to high-dimensional, complex nonlinear systems, including pixel-based scenarios.
    Unmasking the Chameleons: A Benchmark for Out-of-Distribution Detection in Medical Tabular Data. (arXiv:2309.16220v1 [cs.LG])
    Despite their success, Machine Learning (ML) models do not generalize effectively to data not originating from the training distribution. To reliably employ ML models in real-world healthcare systems and avoid inaccurate predictions on out-of-distribution (OOD) data, it is crucial to detect OOD samples. Numerous OOD detection approaches have been suggested in other fields - especially in computer vision - but it remains unclear whether the challenge is resolved when dealing with medical tabular data. To answer this pressing need, we propose an extensive reproducible benchmark to compare different methods across a suite of tests including both near and far OODs. Our benchmark leverages the latest versions of eICU and MIMIC-IV, two public datasets encompassing tens of thousands of ICU patients in several hospitals. We consider a wide array of density-based methods and SOTA post-hoc detectors across diverse predictive architectures, including MLP, ResNet, and Transformer. Our findings show that i) the problem appears to be solved for far-OODs, but remains open for near-OODs; ii) post-hoc methods alone perform poorly, but improve substantially when coupled with distance-based mechanisms; iii) the transformer architecture is far less overconfident compared to MLP and ResNet.
    Compositional Sculpting of Iterative Generative Processes. (arXiv:2309.16115v1 [cs.LG])
    High training costs of generative models and the need to fine-tune them for specific tasks have created a strong interest in model reuse and composition. A key challenge in composing iterative generative processes, such as GFlowNets and diffusion models, is that to realize the desired target distribution, all steps of the generative process need to be coordinated, and satisfy delicate balance conditions. In this work, we propose Compositional Sculpting: a general approach for defining compositions of iterative generative processes. We then introduce a method for sampling from these compositions built on classifier guidance. We showcase ways to accomplish compositional sculpting in both GFlowNets and diffusion models. We highlight two binary operations $\unicode{x2014}$ the harmonic mean ($p_1 \otimes p_2$) and the contrast ($p_1 \unicode{x25D1}\,p_2$) between pairs, and the generalization of these operations to multiple component distributions. We offer empirical results on image and molecular generation tasks.  ( 2 min )
    Differential 2D Copula Approximating Transforms via Sobolev Training: 2-Cats Networks. (arXiv:2309.16391v1 [cs.LG])
    Copulas are a powerful statistical tool that captures dependencies across data dimensions. When applying Copulas, we can estimate multivariate distribution functions by initially estimating independent marginals, an easy task, and then a single copulating function, $C$, to connect the marginals, a hard task. For two-dimensional data, a copula is a two-increasing function of the form $C: (u,v)\in \mathbf{I}^2 \rightarrow \mathbf{I}$, where $\mathbf{I} = [0, 1]$. In this paper, we show how Neural Networks (NNs) can approximate any two-dimensional copula non-parametrically. Our approach, denoted as 2-Cats, is inspired by the Physics-Informed Neural Networks and Sobolev Training literature. Not only do we show that we can estimate the output of a 2d Copula better than the state-of-the-art, our approach is non-parametric and respects the mathematical properties of a Copula $C$.
    LagrangeBench: A Lagrangian Fluid Mechanics Benchmarking Suite. (arXiv:2309.16342v1 [cs.LG])
    Machine learning has been successfully applied to grid-based PDE modeling in various scientific applications. However, learned PDE solvers based on Lagrangian particle discretizations, which are the preferred approach to problems with free surfaces or complex physics, remain largely unexplored. We present LagrangeBench, the first benchmarking suite for Lagrangian particle problems, focusing on temporal coarse-graining. In particular, our contribution is: (a) seven new fluid mechanics datasets (four in 2D and three in 3D) generated with the Smoothed Particle Hydrodynamics (SPH) method including the Taylor-Green vortex, lid-driven cavity, reverse Poiseuille flow, and dam break, each of which includes different physics like solid wall interactions or free surface, (b) efficient JAX-based API with various recent training strategies and neighbors search routine, and (c) JAX implementation of established Graph Neural Networks (GNNs) like GNS and SEGNN with baseline results. Finally, to measure the performance of learned surrogates we go beyond established position errors and introduce physical metrics like kinetic energy MSE and Sinkhorn distance for the particle distribution. Our codebase is available under the URL: https://github.com/tumaer/lagrangebench
    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces. (arXiv:2309.16597v1 [cs.LG])
    Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.
    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints. (arXiv:2309.16240v1 [cs.LG])
    The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    Recent Advances of Differential Privacy in Centralized Deep Learning: A Systematic Survey. (arXiv:2309.16398v1 [cs.LG])
    Differential Privacy has become a widely popular method for data protection in machine learning, especially since it allows formulating strict mathematical privacy guarantees. This survey provides an overview of the state-of-the-art of differentially private centralized deep learning, thorough analyses of recent advances and open problems, as well as a discussion of potential future developments in the field. Based on a systematic literature review, the following topics are addressed: auditing and evaluation methods for private models, improvements of privacy-utility trade-offs, protection against a broad range of threats and attacks, differentially private generative models, and emerging application domains.
    ShapeDBA: Generating Effective Time Series Prototypes using ShapeDTW Barycenter Averaging. (arXiv:2309.16353v1 [cs.LG])
    Time series data can be found in almost every domain, ranging from the medical field to manufacturing and wireless communication. Generating realistic and useful exemplars and prototypes is a fundamental data analysis task. In this paper, we investigate a novel approach to generating realistic and useful exemplars and prototypes for time series data. Our approach uses a new form of time series average, the ShapeDTW Barycentric Average. We therefore turn our attention to accurately generating time series prototypes with a novel approach. The existing time series prototyping approaches rely on the Dynamic Time Warping (DTW) similarity measure such as DTW Barycentering Average (DBA) and SoftDBA. These last approaches suffer from a common problem of generating out-of-distribution artifacts in their prototypes. This is mostly caused by the DTW variant used and its incapability of detecting neighborhood similarities, instead it detects absolute similarities. Our proposed method, ShapeDBA, uses the ShapeDTW variant of DTW, that overcomes this issue. We chose time series clustering, a popular form of time series analysis to evaluate the outcome of ShapeDBA compared to the other prototyping approaches. Coupled with the k-means clustering algorithm, and evaluated on a total of 123 datasets from the UCR archive, our proposed averaging approach is able to achieve new state-of-the-art results in terms of Adjusted Rand Index.
    A Primer on Bayesian Neural Networks: Review and Debates. (arXiv:2309.16314v1 [stat.ML])
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    A framework for paired-sample hypothesis testing for high-dimensional data. (arXiv:2309.16274v1 [stat.ML])
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.
    RealFill: Reference-Driven Generation for Authentic Image Completion. (arXiv:2309.16668v1 [cs.CV])
    Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. See more results on our project page: https://realfill.github.io
    GInX-Eval: Towards In-Distribution Evaluation of Graph Neural Network Explanations. (arXiv:2309.16223v1 [cs.AI])
    Diverse explainability methods of graph neural networks (GNN) have recently been developed to highlight the edges and nodes in the graph that contribute the most to the model predictions. However, it is not clear yet how to evaluate the correctness of those explanations, whether it is from a human or a model perspective. One unaddressed bottleneck in the current evaluation procedure is the problem of out-of-distribution explanations, whose distribution differs from those of the training data. This important issue affects existing evaluation metrics such as the popular faithfulness or fidelity score. In this paper, we show the limitations of faithfulness metrics. We propose GInX-Eval (Graph In-distribution eXplanation Evaluation), an evaluation procedure of graph explanations that overcomes the pitfalls of faithfulness and offers new insights on explainability methods. Using a retraining strategy, the GInX score measures how informative removed edges are for the model and the EdgeRank score evaluates if explanatory edges are correctly ordered by their importance. GInX-Eval verifies if ground-truth explanations are instructive to the GNN model. In addition, it shows that many popular methods, including gradient-based methods, produce explanations that are not better than a random designation of edges as important subgraphs, challenging the findings of current works in the area. Results with GInX-Eval are consistent across multiple datasets and align with human evaluation.
    Predicting Cardiovascular Complications in Post-COVID-19 Patients Using Data-Driven Machine Learning Models. (arXiv:2309.16059v1 [cs.LG])
    The COVID-19 pandemic has globally posed numerous health challenges, notably the emergence of post-COVID-19 cardiovascular complications. This study addresses this by utilizing data-driven machine learning models to predict such complications in 352 post-COVID-19 patients from Iraq. Clinical data, including demographics, comorbidities, lab results, and imaging, were collected and used to construct predictive models. These models, leveraging various machine learning algorithms, demonstrated commendable performance in identifying patients at risk. Early detection through these models promises timely interventions and improved outcomes. In conclusion, this research underscores the potential of data-driven machine learning for predicting post-COVID-19 cardiovascular complications, emphasizing the need for continued validation and research in diverse clinical settings.
    Mixup Your Own Pairs. (arXiv:2309.16633v1 [cs.LG])
    In representation learning, regression has traditionally received less attention than classification. Directly applying representation learning techniques designed for classification to regression often results in fragmented representations in the latent space, yielding sub-optimal performance. In this paper, we argue that the potential of contrastive learning for regression has been overshadowed due to the neglect of two crucial aspects: ordinality-awareness and hardness. To address these challenges, we advocate "mixup your own contrastive pairs for supervised contrastive regression", instead of relying solely on real/augmented samples. Specifically, we propose Supervised Contrastive Learning for Regression with Mixup (SupReMix). It takes anchor-inclusive mixtures (mixup of the anchor and a distinct negative sample) as hard negative pairs and anchor-exclusive mixtures (mixup of two distinct negative samples) as hard positive pairs at the embedding level. This strategy formulates harder contrastive pairs by integrating richer ordinal information. Through extensive experiments on six regression datasets including 2D images, volumetric images, text, tabular data, and time-series signals, coupled with theoretical analysis, we demonstrate that SupReMix pre-training fosters continuous ordered representations of regression data, resulting in significant improvement in regression performance. Furthermore, SupReMix is superior to other approaches in a range of regression challenges including transfer learning, imbalanced training data, and scenarios with fewer training samples.
    The Trickle-down Impact of Reward (In-)consistency on RLHF. (arXiv:2309.16155v1 [cs.CL])
    Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process.
    Digital Twin-based Anomaly Detection with Curriculum Learning in Cyber-physical Systems. (arXiv:2309.15995v1 [cs.LG])
    Anomaly detection is critical to ensure the security of cyber-physical systems (CPS). However, due to the increasing complexity of attacks and CPS themselves, anomaly detection in CPS is becoming more and more challenging. In our previous work, we proposed a digital twin-based anomaly detection method, called ATTAIN, which takes advantage of both historical and real-time data of CPS. However, such data vary significantly in terms of difficulty. Therefore, similar to human learning processes, deep learning models (e.g., ATTAIN) can benefit from an easy-to-difficult curriculum. To this end, in this paper, we present a novel approach, named digitaL twin-based Anomaly deTecTion wIth Curriculum lEarning (LATTICE), which extends ATTAIN by introducing curriculum learning to optimize its learning paradigm. LATTICE attributes each sample with a difficulty score, before being fed into a training scheduler. The training scheduler samples batches of training data based on these difficulty scores such that learning from easy to difficult data can be performed. To evaluate LATTICE, we use five publicly available datasets collected from five real-world CPS testbeds. We compare LATTICE with ATTAIN and two other state-of-the-art anomaly detectors. Evaluation results show that LATTICE outperforms the three baselines and ATTAIN by 0.906%-2.367% in terms of the F1 score. LATTICE also, on average, reduces the training time of ATTAIN by 4.2% on the five datasets and is on par with the baselines in terms of detection delay time.
    Identifying Risk Factors for Post-COVID-19 Mental Health Disorders: A Machine Learning Perspective. (arXiv:2309.16055v1 [cs.LG])
    In this study, we leveraged machine learning techniques to identify risk factors associated with post-COVID-19 mental health disorders. Our analysis, based on data collected from 669 patients across various provinces in Iraq, yielded valuable insights. We found that age, gender, and geographical region of residence were significant demographic factors influencing the likelihood of developing mental health disorders in post-COVID-19 patients. Additionally, comorbidities and the severity of COVID-19 illness were important clinical predictors. Psychosocial factors, such as social support, coping strategies, and perceived stress levels, also played a substantial role. Our findings emphasize the complex interplay of multiple factors in the development of mental health disorders following COVID-19 recovery. Healthcare providers and policymakers should consider these risk factors when designing targeted interventions and support systems for individuals at risk. Machine learning-based approaches can provide a valuable tool for predicting and preventing adverse mental health outcomes in post-COVID-19 patients. Further research and prospective studies are needed to validate these findings and enhance our understanding of the long-term psychological impact of the COVID-19 pandemic. This study contributes to the growing body of knowledge regarding the mental health consequences of the COVID-19 pandemic and underscores the importance of a multidisciplinary approach to address the diverse needs of individuals on the path to recovery. Keywords: COVID-19, mental health, risk factors, machine learning, Iraq
    Learning Interpretable Characteristic Kernels via Decision Forests. (arXiv:1812.00029v3 [stat.ML] UPDATED)
    Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.
    Enhancing Cross-Category Learning in Recommendation Systems with Multi-Layer Embedding Training. (arXiv:2309.15881v1 [cs.LG])
    Modern DNN-based recommendation systems rely on training-derived embeddings of sparse features. Input sparsity makes obtaining high-quality embeddings for rarely-occurring categories harder as their representations are updated infrequently. We demonstrate a training-time technique to produce superior embeddings via effective cross-category learning and theoretically explain its surprising effectiveness. The scheme, termed the multi-layer embeddings training (MLET), trains embeddings using factorization of the embedding layer, with an inner dimension higher than the target embedding dimension. For inference efficiency, MLET converts the trained two-layer embedding into a single-layer one thus keeping inference-time model size unchanged. Empirical superiority of MLET is puzzling as its search space is not larger than that of the single-layer embedding. The strong dependence of MLET on the inner dimension is even more surprising. We develop a theory that explains both of these behaviors by showing that MLET creates an adaptive update mechanism modulated by the singular vectors of embeddings. When tested on multiple state-of-the-art recommendation models for click-through rate (CTR) prediction tasks, MLET consistently produces better models, especially for rare items. At constant model quality, MLET allows embedding dimension, and model size, reduction by up to 16x, and 5.8x on average, across the models.
    Intrinsic Language-Guided Exploration for Complex Long-Horizon Robotic Manipulation Tasks. (arXiv:2309.16347v1 [cs.RO])
    Current reinforcement learning algorithms struggle in sparse and complex environments, most notably in long-horizon manipulation tasks entailing a plethora of different sequences. In this work, we propose the Intrinsically Guided Exploration from Large Language Models (IGE-LLMs) framework. By leveraging LLMs as an assistive intrinsic reward, IGE-LLMs guides the exploratory process in reinforcement learning to address intricate long-horizon with sparse rewards robotic manipulation tasks. We evaluate our framework and related intrinsic learning methods in an environment challenged with exploration, and a complex robotic manipulation task challenged by both exploration and long-horizons. Results show IGE-LLMs (i) exhibit notably higher performance over related intrinsic methods and the direct use of LLMs in decision-making, (ii) can be combined and complement existing learning methods highlighting its modularity, (iii) are fairly insensitive to different intrinsic scaling parameters, and (iv) maintain robustness against increased levels of uncertainty and horizons.
    HyperPPO: A scalable method for finding small policies for robotic control. (arXiv:2309.16663v1 [cs.RO])
    Models with fewer parameters are necessary for the neural control of memory-limited, performant robots. Finding these smaller neural network architectures can be time-consuming. We propose HyperPPO, an on-policy reinforcement learning algorithm that utilizes graph hypernetworks to estimate the weights of multiple neural architectures simultaneously. Our method estimates weights for networks that are much smaller than those in common-use networks yet encode highly performant policies. We obtain multiple trained policies at the same time while maintaining sample efficiency and provide the user the choice of picking a network architecture that satisfies their computational constraints. We show that our method scales well - more training resources produce faster convergence to higher-performing architectures. We demonstrate that the neural policies estimated by HyperPPO are capable of decentralized control of a Crazyflie2.1 quadrotor. Website: https://sites.google.com/usc.edu/hyperppo
    Identifying factors associated with fast visual field progression in patients with ocular hypertension based on unsupervised machine learning. (arXiv:2309.15867v1 [cs.LG])
    Purpose: To identify ocular hypertension (OHT) subtypes with different trends of visual field (VF) progression based on unsupervised machine learning and to discover factors associated with fast VF progression. Participants: A total of 3133 eyes of 1568 ocular hypertension treatment study (OHTS) participants with at least five follow-up VF tests were included in the study. Methods: We used a latent class mixed model (LCMM) to identify OHT subtypes using standard automated perimetry (SAP) mean deviation (MD) trajectories. We characterized the subtypes based on demographic, clinical, ocular, and VF factors at the baseline. We then identified factors driving fast VF progression using generalized estimating equation (GEE) and justified findings qualitatively and quantitatively. Results: The LCMM model discovered four clusters (subtypes) of eyes with different trajectories of MD worsening. The number of eyes in clusters were 794 (25%), 1675 (54%), 531 (17%) and 133 (4%). We labelled the clusters as Improvers, Stables, Slow progressors, and Fast progressors based on their mean of MD decline, which were 0.08, -0.06, -0.21, and -0.45 dB/year, respectively. Eyes with fast VF progression had higher baseline age, intraocular pressure (IOP), pattern standard deviation (PSD) and refractive error (RE), but lower central corneal thickness (CCT). Fast progression was associated with calcium channel blockers, being male, heart disease history, diabetes history, African American race, stroke history, and migraine headaches.
    A novel approach to measuring patent claim scope based on probabilities obtained from (large) language models. (arXiv:2309.10003v2 [cs.CL] UPDATED)
    This work proposes to measure the scope of a patent claim as the reciprocal of the self-information contained in this claim. A probability of occurrence of the claim is obtained from a language model and this probability is used to compute the self-information. Grounded in information theory, this approach is based on the assumption that an unlikely concept is more informative than a usual concept, insofar as it is more surprising. In turn, the more surprising the information required to defined the claim, the narrower its scope. Five language models are considered, ranging from simplest models (each word or character is assigned an identical probability) to intermediate models (using average word or character frequencies), to a large language model (GPT2). Interestingly, the scope resulting from the simplest language models is proportional to the reciprocal of the number of words or characters involved in the claim, a metric already used in previous works. Application is made to multiple series of patent claims directed to distinct inventions, where each series consists of claims devised to have a gradually decreasing scope. The performance of the language models is assessed with respect to several ad hoc tests. The more sophisticated the model, the better the results. I.e., the GPT2 probability model outperforms models based on word and character frequencies, which themselves outdo the simplest models based on word or character counts. Still, the character count appears to be a more reliable indicator than the word count.
    GNNHLS: Evaluating Graph Neural Network Inference via High-Level Synthesis. (arXiv:2309.16022v1 [cs.LG])
    With the ever-growing popularity of Graph Neural Networks (GNNs), efficient GNN inference is gaining tremendous attention. Field-Programming Gate Arrays (FPGAs) are a promising execution platform due to their fine-grained parallelism, low-power consumption, reconfigurability, and concurrent execution. Even better, High-Level Synthesis (HLS) tools bridge the gap between the non-trivial FPGA development efforts and rapid emergence of new GNN models. In this paper, we propose GNNHLS, an open-source framework to comprehensively evaluate GNN inference acceleration on FPGAs via HLS, containing a software stack for data generation and baseline deployment, and FPGA implementations of 6 well-tuned GNN HLS kernels. We evaluate GNNHLS on 4 graph datasets with distinct topologies and scales. The results show that GNNHLS achieves up to 50.8x speedup and 423x energy reduction relative to the CPU baselines. Compared with the GPU baselines, GNNHLS achieves up to 5.16x speedup and 74.5x energy reduction.
    Abdominal multi-organ segmentation in CT using Swinunter. (arXiv:2309.16210v1 [eess.IV])
    Abdominal multi-organ segmentation in computed tomography (CT) is crucial for many clinical applications including disease detection and treatment planning. Deep learning methods have shown unprecedented performance in this perspective. However, it is still quite challenging to accurately segment different organs utilizing a single network due to the vague boundaries of organs, the complex background, and the substantially different organ size scales. In this work we used make transformer-based model for training. It was found through previous years' competitions that basically all of the top 5 methods used CNN-based methods, which is likely due to the lack of data volume that prevents transformer-based methods from taking full advantage. The thousands of samples in this competition may enable the transformer-based model to have more excellent results. The results on the public validation set also show that the transformer-based model can achieve an acceptable result and inference time.
    ModuLoRA: Finetuning 3-Bit LLMs on Consumer GPUs by Integrating with Modular Quantizers. (arXiv:2309.16119v1 [cs.LG])
    We propose a memory-efficient finetuning algorithm for large language models (LLMs) that supports finetuning LLMs with 65B parameters in 3-bit or 4-bit precision on as little as one 48GB GPU. Our method, modular low-rank adaptation (ModuLoRA), integrates any user-specified weight quantizer with finetuning via low-rank adapters (LoRAs). Our approach relies on a simple quantization-agnostic backward pass that adaptively materializes low-precision LLM weights from a custom black-box quantization module. This approach enables finetuning 3-bit LLMs for the first time--leveraging state-of-the-art 3-bit OPTQ quantization often outperforms finetuning that relies on less sophisticated 4-bit and 8-bit methods. In our experiments, ModuLoRA attains competitive performance on text classification, natural language infernece, and instruction following tasks using significantly less memory than existing approaches, and we also surpass the state-of-the-art ROUGE score on a popular summarization task. We release ModuLoRA together with a series of low-precision models--including the first family of 3-bit instruction following Alpaca LLMs--as part of LLMTOOLS, a user-friendly library for quantizing, running, and finetuning LLMs on consumer GPUs.
    Advancing Federated Learning in 6G: A Trusted Architecture with Graph-based Analysis. (arXiv:2309.05525v3 [cs.NI] UPDATED)
    Integrating native AI support into the network architecture is an essential objective of 6G. Federated Learning (FL) emerges as a potential paradigm, facilitating decentralized AI model training across a diverse range of devices under the coordination of a central server. However, several challenges hinder its wide application in the 6G context, such as malicious attacks and privacy snooping on local model updates, and centralization pitfalls. This work proposes a trusted architecture for supporting FL, which utilizes Distributed Ledger Technology (DLT) and Graph Neural Network (GNN), including three key features. First, a pre-processing layer employing homomorphic encryption is incorporated to securely aggregate local models, preserving the privacy of individual models. Second, given the distributed nature and graph structure between clients and nodes in the pre-processing layer, GNN is leveraged to identify abnormal local models, enhancing system security. Third, DLT is utilized to decentralize the system by selecting one of the candidates to perform the central server's functions. Additionally, DLT ensures reliable data management by recording data exchanges in an immutable and transparent ledger. The feasibility of the novel architecture is validated through simulations, demonstrating improved performance in anomalous model detection and global model accuracy compared to relevant baselines.
    Data Augmentation in the Underparameterized and Overparameterized Regimes. (arXiv:2202.09134v3 [cs.LG] UPDATED)
    We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables.
    Analytical Modelling of Raw Data for Flow-Guided In-body Nanoscale Localization. (arXiv:2309.16034v1 [cs.ET])
    Advancements in nanotechnology and material science are paving the way toward nanoscale devices that combine sensing, computing, data and energy storage, and wireless communication. In precision medicine, these nanodevices show promise for disease diagnostics, treatment, and monitoring from within the patients' bloodstreams. Assigning the location of a sensed biological event with the event itself, which is the main proposition of flow-guided in-body nanoscale localization, would be immensely beneficial from the perspective of precision medicine. The nanoscale nature of the nanodevices and the challenging environment that the bloodstream represents, result in current flow-guided localization approaches being constrained in their communication and energy-related capabilities. The communication and energy constraints of the nanodevices result in different features of raw data for flow-guided localization, in turn affecting its performance. An analytical modeling of the effects of imperfect communication and constrained energy causing intermittent operation of the nanodevices on the raw data produced by the nanodevices would be beneficial. Hence, we propose an analytical model of raw data for flow-guided localization, where the raw data is modeled as a function of communication and energy-related capabilities of the nanodevice. We evaluate the model by comparing its output with the one obtained through the utilization of a simulator for objective evaluation of flow-guided localization, featuring comparably higher level of realism. Our results across a number of scenarios and heterogeneous performance metrics indicate high similarity between the model and simulator-generated raw datasets.
    RLLTE: Long-Term Evolution Project of Reinforcement Learning. (arXiv:2309.16382v1 [cs.AI])
    We present RLLTE: a long-term evolution, extremely modular, and open-source framework for reinforcement learning (RL) research and application. Beyond delivering top-notch algorithm implementations, RLLTE also serves as a toolkit for developing algorithms. More specifically, RLLTE decouples the RL algorithms completely from the exploitation-exploration perspective, providing a large number of components to accelerate algorithm development and evolution. In particular, RLLTE is the first RL framework to build a complete and luxuriant ecosystem, which includes model training, evaluation, deployment, benchmark hub, and large language model (LLM)-empowered copilot. RLLTE is expected to set standards for RL engineering practice and be highly stimulative for industry and academia.
    Flexible and efficient spatial extremes emulation via variational autoencoders. (arXiv:2307.08079v2 [stat.ML] UPDATED)
    Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
    DPA-WNO: A gray box model for a class of stochastic mechanics problem. (arXiv:2309.15128v2 [cs.LG] UPDATED)
    The well-known governing physics in science and engineering is often based on certain assumptions and approximations. Therefore, analyses and designs carried out based on these equations are also approximate. The emergence of data-driven models has, to a certain degree, addressed this challenge; however, the purely data-driven models often (a) lack interpretability, (b) are data-hungry, and (c) do not generalize beyond the training window. Operator learning has recently been proposed as a potential alternative to address the aforementioned challenges; however, the challenges are still persistent. We here argue that one of the possible solutions resides in data-physics fusion, where the data-driven model is used to correct/identify the missing physics. To that end, we propose a novel Differentiable Physics Augmented Wavelet Neural Operator (DPA-WNO). The proposed DPA-WNO blends a differentiable physics solver with the Wavelet Neural Operator (WNO), where the role of WNO is to model the missing physics. This empowers the proposed framework to exploit the capability of WNO to learn from data while retaining the interpretability and generalizability associated with physics-based solvers. We illustrate the applicability of the proposed approach in solving time-dependent uncertainty quantification problems due to randomness in the initial condition. Four benchmark uncertainty quantification and reliability analysis examples from various fields of science and engineering are solved using the proposed approach. The results presented illustrate interesting features of the proposed approach.
    Lossless Transformations and Excess Risk Bounds in Statistical Inference. (arXiv:2307.16735v2 [cs.IT] UPDATED)
    We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.
    Uncovering Neural Scaling Laws in Molecular Representation Learning. (arXiv:2309.15123v2 [physics.chem-ph] UPDATED)
    Molecular Representation Learning (MRL) has emerged as a powerful tool for drug and materials discovery in a variety of tasks such as virtual screening and inverse design. While there has been a surge of interest in advancing model-centric techniques, the influence of both data quantity and quality on molecular representations is not yet clearly understood within this field. In this paper, we delve into the neural scaling behaviors of MRL from a data-centric viewpoint, examining four key dimensions: (1) data modalities, (2) dataset splitting, (3) the role of pre-training, and (4) model capacity. Our empirical studies confirm a consistent power-law relationship between data volume and MRL performance across these dimensions. Additionally, through detailed analysis, we identify potential avenues for improving learning efficiency. To challenge these scaling laws, we adapt seven popular data pruning strategies to molecular data and benchmark their performance. Our findings underline the importance of data-centric MRL and highlight possible directions for future research.
    Telescope: An Automated Hybrid Forecasting Approach on a Level-Playing Field. (arXiv:2309.15871v1 [cs.LG])
    In many areas of decision-making, forecasting is an essential pillar. Consequently, many different forecasting methods have been proposed. From our experience, recently presented forecasting methods are computationally intensive, poorly automated, tailored to a particular data set, or they lack a predictable time-to-result. To this end, we introduce Telescope, a novel machine learning-based forecasting approach that automatically retrieves relevant information from a given time series and splits it into parts, handling each of them separately. In contrast to deep learning methods, our approach doesn't require parameterization or the need to train and fit a multitude of parameters. It operates with just one time series and provides forecasts within seconds without any additional setup. Our experiments show that Telescope outperforms recent methods by providing accurate and reliable forecasts while making no assumptions about the analyzed time series.
    STAEformer: Spatio-Temporal Adaptive Embedding Makes Vanilla Transformer SOTA for Traffic Forecasting. (arXiv:2308.10425v4 [cs.LG] UPDATED)
    With the rapid development of the Intelligent Transportation System (ITS), accurate traffic forecasting has emerged as a critical challenge. The key bottleneck lies in capturing the intricate spatio-temporal traffic patterns. In recent years, numerous neural networks with complicated architectures have been proposed to address this issue. However, the advancements in network architectures have encountered diminishing performance gains. In this study, we present a novel component called spatio-temporal adaptive embedding that can yield outstanding results with vanilla transformers. Our proposed Spatio-Temporal Adaptive Embedding transformer (STAEformer) achieves state-of-the-art performance on five real-world traffic forecasting datasets. Further experiments demonstrate that spatio-temporal adaptive embedding plays a crucial role in traffic forecasting by effectively capturing intrinsic spatio-temporal relations and chronological information in traffic time series.
    Kairos: Practical Intrusion Detection and Investigation using Whole-system Provenance. (arXiv:2308.05034v3 [cs.CR] UPDATED)
    Provenance graphs are structured audit logs that describe the history of a system's execution. Recent studies have explored a variety of techniques to analyze provenance graphs for automated host intrusion detection, focusing particularly on advanced persistent threats. Sifting through their design documents, we identify four common dimensions that drive the development of provenance-based intrusion detection systems (PIDSes): scope (can PIDSes detect modern attacks that infiltrate across application boundaries?), attack agnosticity (can PIDSes detect novel attacks without a priori knowledge of attack characteristics?), timeliness (can PIDSes efficiently monitor host systems as they run?), and attack reconstruction (can PIDSes distill attack activity from large provenance graphs so that sysadmins can easily understand and quickly respond to system intrusion?). We present KAIROS, the first PIDS that simultaneously satisfies the desiderata in all four dimensions, whereas existing approaches sacrifice at least one and struggle to achieve comparable detection performance. Kairos leverages a novel graph neural network-based encoder-decoder architecture that learns the temporal evolution of a provenance graph's structural changes to quantify the degree of anomalousness for each system event. Then, based on this fine-grained information, Kairos reconstructs attack footprints, generating compact summary graphs that accurately describe malicious activity over a stream of system audit logs. Using state-of-the-art benchmark datasets, we demonstrate that Kairos outperforms previous approaches.
    Developing a Philosophical Framework for Fair Machine Learning: Lessons From The Case of Algorithmic Collusion. (arXiv:2208.06308v2 [cs.LG] UPDATED)
    Fair machine learning research has been primarily concerned with classification tasks that result in discrimination. However, as machine learning algorithms are applied in new contexts the harms and injustices that result are qualitatively different than those presently studied. The existing research paradigm in machine learning which develops metrics and definitions of fairness cannot account for these qualitatively different types of injustice. One example of this is the problem of algorithmic collusion and market fairness. The negative consequences of algorithmic collusion affect all consumers, not only particular members of a protected class. Drawing on this case study, I propose an ethical framework for researchers and practitioners in machine learning seeking to develop and apply fairness metrics that extends to new domains. This contribution ties the development of formal metrics of fairness to specifically scoped normative principles. This enables fairness metrics to reflect different concerns from discrimination. I conclude with the limitations of my proposal and discuss promising avenues for future research.
    Creating walls to avoid unwanted points in root finding and optimization. (arXiv:2309.11475v2 [math.OC] UPDATED)
    In root finding and optimization, there are many cases where there is a closed set $A$ one likes that the sequence constructed by one's favourite method will not converge to A (here, we do not assume extra properties on $A$ such as being convex or connected). For example, if one wants to find roots, and one chooses initial points in the basin of attraction for 1 root $x^*$ (a fact which one may not know before hand), then one will always end up in that root. In this case, one would like to have a mechanism to avoid this point $z^*$ in the next runs of one's algorithm. In this paper, we propose two new methods aiming to achieve this. In the first method, we divide the cost function by an appropriate power of the distance function to $A$. This idea is inspired by how one would try to find all roots of a function in 1 variable. In the second method, which is more suitable for constrained optimization, we redefine the value of the function to be a big constant on $A$. We also propose, based on this, an algorithm to escape the basin of attraction of a component of positive dimension to reach another component. As an application, we prove a rigorous guarantee for finding roots of a meromorphic function of 1 complex variable in a given domain. Along the way, we compare with main existing relevant methods in the current literature. We provide several examples in various different settings to illustrate the usefulness of the new approach.
    Improving Robustness of Deep Convolutional Neural Networks via Multiresolution Learning. (arXiv:2309.13752v2 [cs.LG] UPDATED)
    The current learning process of deep learning, regardless of any deep neural network (DNN) architecture and/or learning algorithm used, is essentially a single resolution training. We explore multiresolution learning and show that multiresolution learning can significantly improve robustness of DNN models for both 1D signal and 2D signal (image) prediction problems. We demonstrate this improvement in terms of both noise and adversarial robustness as well as with small training dataset size. Our results also suggest that it may not be necessary to trade standard accuracy for robustness with multiresolution learning, which is, interestingly, contrary to the observation obtained from the traditional single resolution learning setting.
    HACMan: Learning Hybrid Actor-Critic Maps for 6D Non-Prehensile Manipulation. (arXiv:2305.03942v3 [cs.RO] UPDATED)
    Manipulating objects without grasping them is an essential component of human dexterity, referred to as non-prehensile manipulation. Non-prehensile manipulation may enable more complex interactions with the objects, but also presents challenges in reasoning about gripper-object interactions. In this work, we introduce Hybrid Actor-Critic Maps for Manipulation (HACMan), a reinforcement learning approach for 6D non-prehensile manipulation of objects using point cloud observations. HACMan proposes a temporally-abstracted and spatially-grounded object-centric action representation that consists of selecting a contact location from the object point cloud and a set of motion parameters describing how the robot will move after making contact. We modify an existing off-policy RL algorithm to learn in this hybrid discrete-continuous action representation. We evaluate HACMan on a 6D object pose alignment task in both simulation and in the real world. On the hardest version of our task, with randomized initial poses, randomized 6D goals, and diverse object categories, our policy demonstrates strong generalization to unseen object categories without a performance drop, achieving an 89% success rate on unseen objects in simulation and 50% success rate with zero-shot transfer in the real world. Compared to alternative action representations, HACMan achieves a success rate more than three times higher than the best baseline. With zero-shot sim2real transfer, our policy can successfully manipulate unseen objects in the real world for challenging non-planar goals, using dynamic and contact-rich non-prehensile skills. Videos can be found on the project website: https://hacman-2023.github.io.
    A Graph Neural Network-Based QUBO-Formulated Hamiltonian-Inspired Loss Function for Combinatorial Optimization using Reinforcement Learning. (arXiv:2308.13978v2 [cs.AI] UPDATED)
    Quadratic Unconstrained Binary Optimization (QUBO) is a generic technique to model various NP-hard combinatorial optimization problems in the form of binary variables. The Hamiltonian function is often used to formulate QUBO problems where it is used as the objective function in the context of optimization. Recently, PI-GNN, a generic scalable framework, has been proposed to address the Combinatorial Optimization (CO) problems over graphs based on a simple Graph Neural Network (GNN) architecture. Their novel contribution was a generic QUBO-formulated Hamiltonian-inspired loss function that was optimized using GNN. In this study, we address a crucial issue related to the aforementioned setup especially observed in denser graphs. The reinforcement learning-based paradigm has also been widely used to address numerous CO problems. Here we also formulate and empirically evaluate the compatibility of the QUBO-formulated Hamiltonian as the generic reward function in the Reinforcement Learning paradigm to directly integrate the actual node projection status during training as the form of rewards. In our experiments, we observed up to 44% improvement in the RL-based setup compared to the PI-GNN algorithm. Our implementation can be found in https://github.com/rizveeredwan/learning-graph-structure.
    Temporal Graph Benchmark for Machine Learning on Temporal Graphs. (arXiv:2307.01026v2 [cs.LG] UPDATED)
    We present the Temporal Graph Benchmark (TGB), a collection of challenging and diverse benchmark datasets for realistic, reproducible, and robust evaluation of machine learning models on temporal graphs. TGB datasets are of large scale, spanning years in duration, incorporate both node and edge-level prediction tasks and cover a diverse set of domains including social, trade, transaction, and transportation networks. For both tasks, we design evaluation protocols based on realistic use-cases. We extensively benchmark each dataset and find that the performance of common models can vary drastically across datasets. In addition, on dynamic node property prediction tasks, we show that simple methods often achieve superior performance compared to existing temporal graph models. We believe that these findings open up opportunities for future research on temporal graphs. Finally, TGB provides an automated machine learning pipeline for reproducible and accessible temporal graph research, including data loading, experiment setup and performance evaluation. TGB will be maintained and updated on a regular basis and welcomes community feedback. TGB datasets, data loaders, example codes, evaluation setup, and leaderboards are publicly available at https://tgb.complexdatalab.com/.
    Model Sparsity Can Simplify Machine Unlearning. (arXiv:2304.04934v8 [cs.LG] UPDATED)
    In response to recent data regulation requirements, machine unlearning (MU) has emerged as a critical process to remove the influence of specific examples from a given model. Although exact unlearning can be achieved through complete model retraining using the remaining dataset, the associated computational costs have driven the development of efficient, approximate unlearning techniques. Moving beyond data-centric MU approaches, our study introduces a novel model-based perspective: model sparsification via weight pruning, which is capable of reducing the gap between exact unlearning and approximate unlearning. We show in both theory and practice that model sparsity can boost the multi-criteria unlearning performance of an approximate unlearner, closing the approximation gap, while continuing to be efficient. This leads to a new MU paradigm, termed prune first, then unlearn, which infuses a sparse model prior into the unlearning process. Building on this insight, we also develop a sparsity-aware unlearning method that utilizes sparsity regularization to enhance the training process of approximate unlearning. Extensive experiments show that our proposals consistently benefit MU in various unlearning scenarios. A notable highlight is the 77% unlearning efficacy gain of fine-tuning (one of the simplest unlearning methods) when using sparsity-aware unlearning. Furthermore, we demonstrate the practical impact of our proposed MU methods in addressing other machine learning challenges, such as defending against backdoor attacks and enhancing transfer learning. Codes are available at https://github.com/OPTML-Group/Unlearn-Sparse.
    Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions. (arXiv:2305.18471v2 [cs.LG] UPDATED)
    We provide a simple convergence proof for AdaGrad optimizing non-convex objectives under only affine noise variance and bounded smoothness assumptions. The proof is essentially based on a novel auxiliary function $\xi$ that helps eliminate the complexity of handling the correlation between the numerator and denominator of AdaGrad's update. Leveraging simple proofs, we are able to obtain tighter results than existing results \citep{faw2022power} and extend the analysis to several new and important cases. Specifically, for the over-parameterized regime, we show that AdaGrad needs only $\mathcal{O}(\frac{1}{\varepsilon^2})$ iterations to ensure the gradient norm smaller than $\varepsilon$, which matches the rate of SGD and significantly tighter than existing rates $\mathcal{O}(\frac{1}{\varepsilon^4})$ for AdaGrad. We then discard the bounded smoothness assumption and consider a realistic assumption on smoothness called $(L_0,L_1)$-smooth condition, which allows local smoothness to grow with the gradient norm. Again based on the auxiliary function $\xi$, we prove that AdaGrad succeeds in converging under $(L_0,L_1)$-smooth condition as long as the learning rate is lower than a threshold. Interestingly, we further show that the requirement on learning rate under the $(L_0,L_1)$-smooth condition is necessary via proof by contradiction, in contrast with the case of uniform smoothness conditions where convergence is guaranteed regardless of learning rate choices. Together, our analyses broaden the understanding of AdaGrad and demonstrate the power of the new auxiliary function in the investigations of AdaGrad.
    Transformer-VQ: Linear-Time Transformers via Vector Quantization. (arXiv:2309.16354v1 [cs.LG])
    We introduce Transformer-VQ, a decoder-only transformer computing softmax-based dense self-attention in linear time. Transformer-VQ's efficient attention is enabled by vector-quantized keys and a novel caching mechanism. In large-scale experiments, Transformer-VQ is shown highly competitive in quality, with strong results on Enwik8 (0.99 bpb), PG-19 (26.6 ppl), and ImageNet64 (3.16 bpb). Code: https://github.com/transformer-vq/transformer_vq
    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v4 [cs.LG] UPDATED)
    We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
    Disambiguated Attention Embedding for Multi-Instance Partial-Label Learning. (arXiv:2305.16912v2 [cs.LG] UPDATED)
    In many real-world tasks, the concerned objects can be represented as a multi-instance bag associated with a candidate label set, which consists of one ground-truth label and several false positive labels. Multi-instance partial-label learning (MIPL) is a learning paradigm to deal with such tasks and has achieved favorable performances. Existing MIPL approach follows the instance-space paradigm by assigning augmented candidate label sets of bags to each instance and aggregating bag-level labels from instance-level labels. However, this scheme may be suboptimal as global bag-level information is ignored and the predicted labels of bags are sensitive to predictions of negative instances. In this paper, we study an alternative scheme where a multi-instance bag is embedded into a single vector representation. Accordingly, an intuitive algorithm named DEMIPL, i.e., Disambiguated attention Embedding for Multi-Instance Partial-Label learning, is proposed. DEMIPL employs a disambiguation attention mechanism to aggregate a multi-instance bag into a single vector representation, followed by a momentum-based disambiguation strategy to identify the ground-truth label from the candidate label set. Furthermore, we introduce a real-world MIPL dataset for colorectal cancer classification. Experimental results on benchmark and real-world datasets validate the superiority of DEMIPL against the compared MIPL and partial-label learning approaches.
    Online Distribution Shift Detection via Recency Prediction. (arXiv:2211.09916v3 [cs.RO] UPDATED)
    When deploying modern machine learning-enabled robotic systems in high-stakes applications, detecting distribution shift is critical. However, most existing methods for detecting distribution shift are not well-suited to robotics settings, where data often arrives in a streaming fashion and may be very high-dimensional. In this work, we present an online method for detecting distribution shift with guarantees on the false positive rate - i.e., when there is no distribution shift, our system is very unlikely (with probability $< \epsilon$) to falsely issue an alert; any alerts that are issued should therefore be heeded. Our method is specifically designed for efficient detection even with high dimensional data, and it empirically achieves up to 11x faster detection on realistic robotics settings compared to prior work while maintaining a low false negative rate in practice (whenever there is a distribution shift in our experiments, our method indeed emits an alert). We demonstrate our approach in both simulation and hardware for a visual servoing task, and show that our method indeed issues an alert before a failure occurs.
    Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification. (arXiv:2301.11562v5 [cs.LG] UPDATED)
    Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning.
    On Learning with LAD. (arXiv:2309.16630v1 [cs.LG])
    The logical analysis of data, LAD, is a technique that yields two-class classifiers based on Boolean functions having disjunctive normal form (DNF) representation. Although LAD algorithms employ optimization techniques, the resulting binary classifiers or binary rules do not lead to overfitting. We propose a theoretical justification for the absence of overfitting by estimating the Vapnik-Chervonenkis dimension (VC dimension) for LAD models where hypothesis sets consist of DNFs with a small number of cubic monomials. We illustrate and confirm our observations empirically.
    Open Source Infrastructure for Differentiable Density Functional Theory. (arXiv:2309.15985v1 [cs.LG])
    Learning exchange correlation functionals, used in quantum chemistry calculations, from data has become increasingly important in recent years, but training such a functional requires sophisticated software infrastructure. For this reason, we build open source infrastructure to train neural exchange correlation functionals. We aim to standardize the processing pipeline by adapting state-of-the-art techniques from work done by multiple groups. We have open sourced the model in the DeepChem library to provide a platform for additional research on differentiable quantum chemistry methods.
    Dice Semimetric Losses: Optimizing the Dice Score with Soft Labels. (arXiv:2303.16296v3 [cs.CV] UPDATED)
    The soft Dice loss (SDL) has taken a pivotal role in numerous automated segmentation pipelines in the medical imaging community. Over the last years, some reasons behind its superior functioning have been uncovered and further optimizations have been explored. However, there is currently no implementation that supports its direct utilization in scenarios involving soft labels. Hence, a synergy between the use of SDL and research leveraging the use of soft labels, also in the context of model calibration, is still missing. In this work, we introduce Dice semimetric losses (DMLs), which (i) are by design identical to SDL in a standard setting with hard labels, but (ii) can be employed in settings with soft labels. Our experiments on the public QUBIQ, LiTS and KiTS benchmarks confirm the potential synergy of DMLs with soft labels (e.g.\ averaging, label smoothing, and knowledge distillation) over hard labels (e.g.\ majority voting and random selection). As a result, we obtain superior Dice scores and model calibration, which supports the wider adoption of DMLs in practice. The code is available at \href{https://github.com/zifuwanggg/JDTLosses}{https://github.com/zifuwanggg/JDTLosses}.
    Unsupervised Discovery of Extreme Weather Events Using Universal Representations of Emergent Organization. (arXiv:2304.12586v2 [physics.comp-ph] UPDATED)
    Spontaneous self-organization is ubiquitous in systems far from thermodynamic equilibrium. While organized structures that emerge dominate transport properties, universal representations that identify and describe these key objects remain elusive. Here, we introduce a theoretically-grounded framework for describing emergent organization that, via data-driven algorithms, is constructive in practice. Its building blocks are spacetime lightcones that embody how information propagates across a system through local interactions. We show that predictive equivalence classes of lightcones -- local causal states -- capture organized behaviors and coherent structures in complex spatiotemporal systems. Employing an unsupervised physics-informed machine learning algorithm and a high-performance computing implementation, we demonstrate automatically discovering coherent structures in two real world domain science problems. We show that local causal states identify vortices and track their power-law decay behavior in two-dimensional fluid turbulence. We then show how to detect and track familiar extreme weather events -- hurricanes and atmospheric rivers -- and discover other novel coherent structures associated with precipitation extremes in high-resolution climate data at the grid-cell level.
    Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models. (arXiv:2309.16521v1 [stat.ML])
    We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction
    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space. (arXiv:2302.06807v3 [stat.ML] UPDATED)
    Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
    Efficient Adversarial Input Generation via Neural Net Patching. (arXiv:2211.16808v2 [cs.LG] UPDATED)
    The generation of adversarial inputs has become a crucial issue in establishing the robustness and trustworthiness of deep neural nets, especially when they are used in safety-critical application domains such as autonomous vehicles and precision medicine. However, the problem poses multiple practical challenges, including scalability issues owing to large-sized networks, and the generation of adversarial inputs that lack important qualities such as naturalness and output-impartiality. This problem shares its end goal with the task of patching neural nets where small changes in some of the network's weights need to be discovered so that upon applying these changes, the modified net produces the desirable output for a given set of inputs. We exploit this connection by proposing to obtain an adversarial input from a patch, with the underlying observation that the effect of changing the weights can also be brought about by changing the inputs instead. Thus, this paper presents a novel way to generate input perturbations that are adversarial for a given network by using an efficient network patching technique. We note that the proposed method is significantly more effective than the prior state-of-the-art techniques.
    M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning. (arXiv:2309.16578v1 [stat.ML])
    Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    D$^3$Fields: Dynamic 3D Descriptor Fields for Zero-Shot Generalizable Robotic Manipulation. (arXiv:2309.16118v1 [cs.RO])
    Scene representation has been a crucial design choice in robotic manipulation systems. An ideal representation should be 3D, dynamic, and semantic to meet the demands of diverse manipulation tasks. However, previous works often lack all three properties simultaneously. In this work, we introduce D$^3$Fields - dynamic 3D descriptor fields. These fields capture the dynamics of the underlying 3D environment and encode both semantic features and instance masks. Specifically, we project arbitrary 3D points in the workspace onto multi-view 2D visual observations and interpolate features derived from foundational models. The resulting fused descriptor fields allow for flexible goal specifications using 2D images with varied contexts, styles, and instances. To evaluate the effectiveness of these descriptor fields, we apply our representation to a wide range of robotic manipulation tasks in a zero-shot manner. Through extensive evaluation in both real-world scenarios and simulations, we demonstrate that D$^3$Fields are both generalizable and effective for zero-shot robotic manipulation tasks. In quantitative comparisons with state-of-the-art dense descriptors, such as Dense Object Nets and DINO, D$^3$Fields exhibit significantly better generalization abilities and manipulation accuracy.
    Resisting Backdoor Attacks in Federated Learning via Bidirectional Elections and Individual Perspective. (arXiv:2309.16456v1 [cs.LG])
    Existing approaches defend against backdoor attacks in federated learning (FL) mainly through a) mitigating the impact of infected models, or b) excluding infected models. The former negatively impacts model accuracy, while the latter usually relies on globally clear boundaries between benign and infected model updates. However, model updates are easy to be mixed and scattered throughout in reality due to the diverse distributions of local data. This work focuses on excluding infected models in FL. Unlike previous perspectives from a global view, we propose Snowball, a novel anti-backdoor FL framework through bidirectional elections from an individual perspective inspired by one principle deduced by us and two principles in FL and deep learning. It is characterized by a) bottom-up election, where each candidate model update votes to several peer ones such that a few model updates are elected as selectees for aggregation; and b) top-down election, where selectees progressively enlarge themselves through picking up from the candidates. We compare Snowball with state-of-the-art defenses to backdoor attacks in FL on five real-world datasets, demonstrating its superior resistance to backdoor attacks and slight impact on the accuracy of the global model.
    DeepPCR: Parallelizing Sequential Operations in Neural Networks. (arXiv:2309.16318v1 [cs.LG])
    Parallelization techniques have become ubiquitous for accelerating inference and training of deep neural networks. Despite this, several operations are still performed in a sequential manner. For instance, the forward and backward passes are executed layer-by-layer, and the output of diffusion models is produced by applying a sequence of denoising steps. This sequential approach results in a computational cost proportional to the number of steps involved, presenting a potential bottleneck as the number of steps increases. In this work, we introduce DeepPCR, a novel algorithm which parallelizes typically sequential operations used in inference and training of neural networks. DeepPCR is based on interpreting a sequence of $L$ steps as the solution of a specific system of equations, which we recover using the Parallel Cyclic Reduction algorithm. This reduces the complexity of computing the sequential operations from $\mathcal{O}(L)$ to $\mathcal{O}(\log_2L)$, thus yielding a speedup for large $L$. To verify the theoretical lower complexity of the algorithm, and to identify regimes for speedup, we test the effectiveness of DeepPCR in parallelizing the forward and backward pass in multi-layer perceptrons, and reach speedups of up to $30\times$ for forward and $200\times$ for backward pass. We additionally showcase the flexibility of DeepPCR by parallelizing training of ResNets with as many as 1024 layers, and generation in diffusion models, enabling up to $7\times$ faster training and $11\times$ faster generation, respectively, when compared to the sequential approach.
    CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption. (arXiv:2309.16563v1 [stat.ML])
    We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    E2Net: Resource-Efficient Continual Learning with Elastic Expansion Network. (arXiv:2309.16117v1 [cs.LG])
    Continual Learning methods are designed to learn new tasks without erasing previous knowledge. However, Continual Learning often requires massive computational power and storage capacity for satisfactory performance. In this paper, we propose a resource-efficient continual learning method called the Elastic Expansion Network (E2Net). Leveraging core subnet distillation and precise replay sample selection, E2Net achieves superior average accuracy and diminished forgetting within the same computational and storage constraints, all while minimizing processing time. In E2Net, we propose Representative Network Distillation to identify the representative core subnet by assessing parameter quantity and output similarity with the working network, distilling analogous subnets within the working network to mitigate reliance on rehearsal buffers and facilitating knowledge transfer across previous tasks. To enhance storage resource utilization, we then propose Subnet Constraint Experience Replay to optimize rehearsal efficiency through a sample storage strategy based on the structures of representative networks. Extensive experiments conducted predominantly on cloud environments with diverse datasets and also spanning the edge environment demonstrate that E2Net consistently outperforms state-of-the-art methods. In addition, our method outperforms competitors in terms of both storage and computational requirements.
    Geodesic Regression Characterizes 3D Shape Changes in the Female Brain During Menstruation. (arXiv:2309.16662v1 [cs.CV])
    Women are at higher risk of Alzheimer's and other neurological diseases after menopause, and yet research connecting female brain health to sex hormone fluctuations is limited. We seek to investigate this connection by developing tools that quantify 3D shape changes that occur in the brain during sex hormone fluctuations. Geodesic regression on the space of 3D discrete surfaces offers a principled way to characterize the evolution of a brain's shape. However, in its current form, this approach is too computationally expensive for practical use. In this paper, we propose approximation schemes that accelerate geodesic regression on shape spaces of 3D discrete surfaces. We also provide rules of thumb for when each approximation can be used. We test our approach on synthetic data to quantify the speed-accuracy trade-off of these approximations and show that practitioners can expect very significant speed-up while only sacrificing little accuracy. Finally, we apply the method to real brain shape data and produce the first characterization of how the female hippocampus changes shape during the menstrual cycle as a function of progesterone: a characterization made (practically) possible by our approximation schemes. Our work paves the way for comprehensive, practical shape analyses in the fields of bio-medicine and computer vision. Our implementation is publicly available on GitHub: https://github.com/bioshape-lab/my28brains.
    Hierarchical Network Data Analytics Framework for B5G Network Automation: Design and Implementation. (arXiv:2309.16269v1 [cs.NI])
    5G introduced modularized network functions (NFs) to support emerging services in a more flexible and elastic manner. To mitigate the complexity in such modularized NF management, automated network operation and management are indispensable, and thus the 3rd generation partnership project (3GPP) has introduced a network data analytics function (NWDAF). However, a conventional NWDAF needs to conduct both inference and training tasks, and thus it is difficult to provide the analytics results to NFs in a timely manner for an increased number of analytics requests. In this article, we propose a hierarchical network data analytics framework (H-NDAF) where inference tasks are distributed to multiple leaf NWDAFs and training tasks are conducted at the root NWDAF. Extensive simulation results using open-source software (i.e., free5GC) demonstrate that H-NDAF can provide sufficiently accurate analytics and faster analytics provision time compared to the conventional NWDAF.
    Projection based fuzzy least squares twin support vector machine for class imbalance problems. (arXiv:2309.15886v1 [cs.LG])
    Class imbalance is a major problem in many real world classification tasks. Due to the imbalance in the number of samples, the support vector machine (SVM) classifier gets biased toward the majority class. Furthermore, these samples are often observed with a certain degree of noise. Therefore, to remove these problems we propose a novel fuzzy based approach to deal with class imbalanced as well noisy datasets. We propose two approaches to address these problems. The first approach is based on the intuitionistic fuzzy membership, termed as robust energy-based intuitionistic fuzzy least squares twin support vector machine (IF-RELSTSVM). Furthermore, we introduce the concept of hyperplane-based fuzzy membership in our second approach, where the final classifier is termed as robust energy-based fuzzy least square twin support vector machine (F-RELSTSVM). By using this technique, the membership values are based on a projection based approach, where the data points are projected on the hyperplanes. The performance of the proposed algorithms is evaluated on several benchmark and synthetic datasets. The experimental results show that the proposed IF-RELSTSVM and F-RELSTSVM models outperform the baseline algorithms. Statistical tests are performed to check the significance of the proposed algorithms. The results show the applicability of the proposed algorithms on noisy as well as imbalanced datasets.
    Channel Vision Transformers: An Image Is Worth C x 16 x 16 Words. (arXiv:2309.16108v1 [cs.CV])
    Vision Transformer (ViT) has emerged as a powerful architecture in the realm of modern computer vision. However, its application in certain imaging fields, such as microscopy and satellite imaging, presents unique challenges. In these domains, images often contain multiple channels, each carrying semantically distinct and independent information. Furthermore, the model must demonstrate robustness to sparsity in input channels, as they may not be densely available during training or testing. In this paper, we propose a modification to the ViT architecture that enhances reasoning across the input channels and introduce Hierarchical Channel Sampling (HCS) as an additional regularization technique to ensure robustness when only partial channels are presented during test time. Our proposed model, ChannelViT, constructs patch tokens independently from each input channel and utilizes a learnable channel embedding that is added to the patch tokens, similar to positional embeddings. We evaluate the performance of ChannelViT on ImageNet, JUMP-CP (microscopy cell imaging), and So2Sat (satellite imaging). Our results show that ChannelViT outperforms ViT on classification tasks and generalizes well, even when a subset of input channels is used during testing. Across our experiments, HCS proves to be a powerful regularizer, independent of the architecture employed, suggesting itself as a straightforward technique for robust ViT training. Lastly, we find that ChannelViT generalizes effectively even when there is limited access to all channels during training, highlighting its potential for multi-channel imaging under real-world conditions with sparse sensors.
    Unified Long-Term Time-Series Forecasting Benchmark. (arXiv:2309.15946v1 [cs.LG])
    In order to support the advancement of machine learning methods for predicting time-series data, we present a comprehensive dataset designed explicitly for long-term time-series forecasting. We incorporate a collection of datasets obtained from diverse, dynamic systems and real-life records. Each dataset is standardized by dividing it into training and test trajectories with predetermined lookback lengths. We include trajectories of length up to $2000$ to ensure a reliable evaluation of long-term forecasting capabilities. To determine the most effective model in diverse scenarios, we conduct an extensive benchmarking analysis using classical and state-of-the-art models, namely LSTM, DeepAR, NLinear, N-Hits, PatchTST, and LatentODE. Our findings reveal intriguing performance comparisons among these models, highlighting the dataset-dependent nature of model effectiveness. Notably, we introduce a custom latent NLinear model and enhance DeepAR with a curriculum learning phase. Both consistently outperform their vanilla counterparts.
    Max-Sliced Mutual Information. (arXiv:2309.16200v1 [cs.LG])
    Quantifying the dependence between high-dimensional random variables is central to statistical learning and inference. Two classical methods are canonical correlation analysis (CCA), which identifies maximally correlated projected versions of the original variables, and Shannon's mutual information, which is a universal dependence measure that also captures high-order dependencies. However, CCA only accounts for linear dependence, which may be insufficient for certain applications, while mutual information is often infeasible to compute/estimate in high dimensions. This work proposes a middle ground in the form of a scalable information-theoretic generalization of CCA, termed max-sliced mutual information (mSMI). mSMI equals the maximal mutual information between low-dimensional projections of the high-dimensional variables, which reduces back to CCA in the Gaussian case. It enjoys the best of both worlds: capturing intricate dependencies in the data while being amenable to fast computation and scalable estimation from samples. We show that mSMI retains favorable structural properties of Shannon's mutual information, like variational forms and identification of independence. We then study statistical estimation of mSMI, propose an efficiently computable neural estimator, and couple it with formal non-asymptotic error bounds. We present experiments that demonstrate the utility of mSMI for several tasks, encompassing independence testing, multi-view representation learning, algorithmic fairness, and generative modeling. We observe that mSMI consistently outperforms competing methods with little-to-no computational overhead.
    Symbolic Imitation Learning: From Black-Box to Explainable Driving Policies. (arXiv:2309.16025v1 [cs.LG])
    Current methods of imitation learning (IL), primarily based on deep neural networks, offer efficient means for obtaining driving policies from real-world data but suffer from significant limitations in interpretability and generalizability. These shortcomings are particularly concerning in safety-critical applications like autonomous driving. In this paper, we address these limitations by introducing Symbolic Imitation Learning (SIL), a groundbreaking method that employs Inductive Logic Programming (ILP) to learn driving policies which are transparent, explainable and generalisable from available datasets. Utilizing the real-world highD dataset, we subject our method to a rigorous comparative analysis against prevailing neural-network-based IL methods. Our results demonstrate that SIL not only enhances the interpretability of driving policies but also significantly improves their applicability across varied driving situations. Hence, this work offers a novel pathway to more reliable and safer autonomous driving systems, underscoring the potential of integrating ILP into the domain of IL.
    Label Augmentation Method for Medical Landmark Detection in Hip Radiograph Images. (arXiv:2309.16066v1 [cs.LG])
    This work reports the empirical performance of an automated medical landmark detection method for predict clinical markers in hip radiograph images. Notably, the detection method was trained using a label-only augmentation scheme; our results indicate that this form of augmentation outperforms traditional data augmentation and produces highly sample efficient estimators. We train a generic U-Net-based architecture under a curriculum consisting of two phases: initially relaxing the landmarking task by enlarging the label points to regions, then gradually eroding these label regions back to the base task. We measure the benefits of this approach on six datasets of radiographs with gold-standard expert annotations.  ( 2 min )
    Imbalanced Data Stream Classification using Dynamic Ensemble Selection. (arXiv:2309.09175v2 [cs.LG] UPDATED)
    Modern streaming data categorization faces significant challenges from concept drift and class imbalanced data. This negatively impacts the output of the classifier, leading to improper classification. Furthermore, other factors such as the overlapping of multiple classes limit the extent of the correctness of the output. This work proposes a novel framework for integrating data pre-processing and dynamic ensemble selection, by formulating the classification framework for the nonstationary drifting imbalanced data stream, which employs the data pre-processing and dynamic ensemble selection techniques. The proposed framework was evaluated using six artificially generated data streams with differing imbalance ratios in combination with two different types of concept drifts. Each stream is composed of 200 chunks of 500 objects described by eight features and contains five concept drifts. Seven pre-processing techniques and two dynamic ensemble selection methods were considered. According to experimental results, data pre-processing combined with Dynamic Ensemble Selection techniques significantly delivers more accuracy when dealing with imbalanced data streams.
    Just Noticeable Difference Modeling for Face Recognition System. (arXiv:2209.05856v2 [cs.CV] UPDATED)
    High-quality face images are required to guarantee the stability and reliability of automatic face recognition (FR) systems in surveillance and security scenarios. However, a massive amount of face data is usually compressed before being analyzed due to limitations on transmission or storage. The compressed images may lose the powerful identity information, resulting in the performance degradation of the FR system. Herein, we make the first attempt to study just noticeable difference (JND) for the FR system, which can be defined as the maximum distortion that the FR system cannot notice. More specifically, we establish a JND dataset including 3530 original images and 137,670 compressed images generated by advanced reference encoding/decoding software based on the Versatile Video Coding (VVC) standard (VTM-15.0). Subsequently, we develop a novel JND prediction model to directly infer JND images for the FR system. In particular, in order to maximum redundancy removal without impairment of robust identity information, we apply the encoder with multiple feature extraction and attention-based feature decomposition modules to progressively decompose face features into two uncorrelated components, i.e., identity and residual features, via self-supervised learning. Then, the residual feature is fed into the decoder to generate the residual map. Finally, the predicted JND map is obtained by subtracting the residual map from the original image. Experimental results have demonstrated that the proposed model achieves higher accuracy of JND map prediction compared with the state-of-the-art JND models, and is capable of saving more bits while maintaining the performance of the FR system compared with VTM-15.0.
    High Perceptual Quality Wireless Image Delivery with Denoising Diffusion Models. (arXiv:2309.15889v1 [eess.IV])
    We consider the image transmission problem over a noisy wireless channel via deep learning-based joint source-channel coding (DeepJSCC) along with a denoising diffusion probabilistic model (DDPM) at the receiver. Specifically, we are interested in the perception-distortion trade-off in the practical finite block length regime, in which separate source and channel coding can be highly suboptimal. We introduce a novel scheme that utilizes the range-null space decomposition of the target image. We transmit the range-space of the image after encoding and employ DDPM to progressively refine its null space contents. Through extensive experiments, we demonstrate significant improvements in distortion and perceptual quality of reconstructed images compared to standard DeepJSCC and the state-of-the-art generative learning-based method. We will publicly share our source code to facilitate further research and reproducibility.
    Classical-to-quantum convolutional neural network transfer learning. (arXiv:2208.14708v2 [quant-ph] UPDATED)
    Machine learning using quantum convolutional neural networks (QCNNs) has demonstrated success in both quantum and classical data classification. In previous studies, QCNNs attained a higher classification accuracy than their classical counterparts under the same training conditions in the few-parameter regime. However, the general performance of large-scale quantum models is difficult to examine because of the limited size of quantum circuits, which can be reliably implemented in the near future. We propose transfer learning as an effective strategy for utilizing small QCNNs in the noisy intermediate-scale quantum era to the full extent. In the classical-to-quantum transfer learning framework, a QCNN can solve complex classification problems without requiring a large-scale quantum circuit by utilizing a pre-trained classical convolutional neural network (CNN). We perform numerical simulations of QCNN models with various sets of quantum convolution and pooling operations for MNIST data classification under transfer learning, in which a classical CNN is trained with Fashion-MNIST data. The results show that transfer learning from classical to quantum CNN performs considerably better than purely classical transfer learning models under similar training conditions.
    Discouraging posterior collapse in hierarchical Variational Autoencoders using context. (arXiv:2302.09976v2 [cs.LG] UPDATED)
    Hierarchical Variational Autoencoders (VAEs) are among the most popular likelihood-based generative models. There is a consensus that the top-down hierarchical VAEs allow effective learning of deep latent structures and avoid problems like posterior collapse. Here, we show that this is not necessarily the case, and the problem of collapsing posteriors remains. To discourage this issue, we propose a deep hierarchical VAE with a context on top. Specifically, we use a Discrete Cosine Transform to obtain the last latent variable. In a series of experiments, we observe that the proposed modification allows us to achieve better utilization of the latent space and does not harm the model's generative abilities.
    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. (arXiv:2308.03666v3 [stat.ML] UPDATED)
    As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    Latent Graph Powered Semi-Supervised Learning on Biomedical Tabular Data. (arXiv:2309.15757v2 [cs.LG] UPDATED)
    In the domain of semi-supervised learning, the current approaches insufficiently exploit the potential of considering inter-instance relationships among (un)labeled data. In this work, we address this limitation by providing an approach for inferring latent graphs that capture the intrinsic data relationships. By leveraging graph-based representations, our approach facilitates the seamless propagation of information throughout the graph, enabling the effective incorporation of global and local knowledge. Through evaluations on biomedical tabular datasets, we compare the capabilities of our approach to other contemporary methods. Our work demonstrates the significance of inter-instance relationship discovery as practical means for constructing robust latent graphs to enhance semi-supervised learning techniques. Our method achieves state-of-the-art results on three biomedical datasets.
    Enhancing Sharpness-Aware Optimization Through Variance Suppression. (arXiv:2309.15639v2 [cs.LG] UPDATED)
    Sharpness-aware minimization (SAM) has well documented merits in enhancing generalization of deep neural networks, even without sizable data augmentation. Embracing the geometry of the loss function, where neighborhoods of 'flat minima' heighten generalization ability, SAM seeks 'flat valleys' by minimizing the maximum loss caused by an adversary perturbing parameters within the neighborhood. Although critical to account for sharpness of the loss function, such an 'over-friendly adversary' can curtail the outmost level of generalization. The novel approach of this contribution fosters stabilization of adversaries through variance suppression (VaSSO) to avoid such friendliness. VaSSO's provable stability safeguards its numerical improvement over SAM in model-agnostic tasks, including image classification and machine translation. In addition, experiments confirm that VaSSO endows SAM with robustness against high levels of label noise.
    Deep learning for bias-correcting CMIP6-class Earth system models. (arXiv:2301.01253v3 [physics.ao-ph] UPDATED)
    The accurate representation of precipitation in Earth system models (ESMs) is crucial for reliable projections of the ecological and socioeconomic impacts in response to anthropogenic global warming. The complex cross-scale interactions of processes that produce precipitation are challenging to model, however, inducing potentially strong biases in ESM fields, especially regarding extremes. State-of-the-art bias correction methods only address errors in the simulated frequency distributions locally at every individual grid cell. Improving unrealistic spatial patterns of the ESM output, which would require spatial context, has not been possible so far. Here, we show that a post-processing method based on physically constrained generative adversarial networks (cGANs) can correct biases of a state-of-the-art, CMIP6-class ESM both in local frequency distributions and in the spatial patterns at once. While our method improves local frequency distributions equally well as gold-standard bias-adjustment frameworks, it strongly outperforms any existing methods in the correction of spatial patterns, especially in terms of the characteristic spatial intermittency of precipitation extremes.
    Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs. (arXiv:2309.05516v2 [cs.CL] UPDATED)
    Large Language Models (LLMs) have proven their exceptional capabilities in performing language-related tasks. However, their deployment poses significant challenges due to their considerable memory and storage requirements. In response to this issue, weight-only quantization, particularly 3 and 4-bit weight-only quantization, has emerged as one of the most viable solutions. As the number of bits decreases, the quantization grid broadens, thus emphasizing the importance of up and down rounding. While previous studies have demonstrated that fine-tuning up and down rounding with the addition of perturbations can enhance accuracy in some scenarios, our study is driven by the precise and limited boundary of these perturbations, where only the threshold for altering the rounding value is of significance. Consequently, we propose a concise and highly effective approach for optimizing the weight rounding task. Our method, named SignRound, involves lightweight block-wise tuning using signed gradient descent, enabling us to achieve outstanding results within 400 steps. SignRound competes impressively against recent methods without introducing additional inference overhead. The source code will be publicly available at \url{https://github.com/intel/neural-compressor} soon.
    DIRA: Dynamic Domain Incremental Regularised Adaptation. (arXiv:2205.00147v4 [cs.LG] UPDATED)
    Autonomous systems (AS) often use Deep Neural Network (DNN) classifiers to allow them to operate in complex, high-dimensional, non-linear, and dynamically changing environments. Due to the complexity of these environments, DNN classifiers may output misclassifications during operation when they face domains not identified during development. Removing a system from operation for retraining becomes impractical as the number of such AS increases. To increase AS reliability and overcome this limitation, DNN classifiers need to have the ability to adapt during operation when faced with different operational domains using a few samples (e.g. 100 samples). However, retraining DNNs on a few samples is known to cause catastrophic forgetting. In this paper, we introduce Dynamic Incremental Regularised Adaptation (DIRA), a framework for operational domain adaption of DNN classifiers using regularisation techniques to overcome catastrophic forgetting and achieve adaptation when retraining using a few samples of the target domain. Our approach shows improvements on different image classification benchmarks aimed at evaluating robustness to distribution shifts (e.g.CIFAR-10C/100C, ImageNet-C), and produces state-of-the-art performance in comparison with other frameworks from the literature.
    STAG: Enabling Low Latency and Low Staleness of GNN-based Services with Dynamic Graphs. (arXiv:2309.15875v1 [cs.LG])
    Many emerging user-facing services adopt Graph Neural Networks (GNNs) to improve serving accuracy. When the graph used by a GNN model changes, representations (embedding) of nodes in the graph should be updated accordingly. However, the node representation update is too slow, resulting in either long response latency of user queries (the inference is performed after the update completes) or high staleness problem (the inference is performed based on stale data). Our in-depth analysis shows that the slow update is mainly due to neighbor explosion problem in graphs and duplicated computation. Based on such findings, we propose STAG, a GNN serving framework that enables low latency and low staleness of GNN-based services. It comprises a collaborative serving mechanism and an additivity-based incremental propagation strategy. With the collaborative serving mechanism, only part of node representations are updated during the update phase, and the final representations are calculated in the inference phase. It alleviates the neighbor explosion problem. The additivity-based incremental propagation strategy reuses intermediate data during the update phase, eliminating duplicated computation problem. Experimental results show that STAG accelerates the update phase by 1.3x~90.1x, and greatly reduces staleness time with a slight increase in response latency.
    IBIA: An Incremental Build-Infer-Approximate Framework for Approximate Inference of Partition Function. (arXiv:2304.06366v2 [cs.AI] UPDATED)
    Exact computation of the partition function is known to be intractable, necessitating approximate inference techniques. Existing methods for approximate inference are slow to converge for many benchmarks. The control of accuracy-complexity trade-off is also non-trivial in many of these methods. We propose a novel incremental build-infer-approximate (IBIA) framework for approximate inference that addresses these issues. In this framework, the probabilistic graphical model is converted into a sequence of clique tree forests (SCTF) with bounded clique sizes. We show that the SCTF can be used to efficiently compute the partition function. We propose two new algorithms which are used to construct the SCTF and prove the correctness of both. The first is an algorithm for incremental construction of CTFs that is guaranteed to give a valid CTF with bounded clique sizes and the second is an approximation algorithm that takes a calibrated CTF as input and yields a valid and calibrated CTF with reduced clique sizes as the output. We have evaluated our method using several benchmark sets from recent UAI competitions and our results show good accuracies with competitive runtimes.
  • Open

    Learning Interpretable Characteristic Kernels via Decision Forests. (arXiv:1812.00029v3 [stat.ML] UPDATED)
    Decision forests are widely used for classification and regression tasks. A lesser known property of tree-based methods is that one can construct a proximity matrix from the tree(s), and these proximity matrices are induced kernels. While there has been extensive research on the applications and properties of kernels, there is relatively little research on kernels induced by decision forests. We construct Kernel Mean Embedding Random Forests (KMERF), which induce kernels from random trees and/or forests using leaf-node proximity. We introduce the notion of an asymptotically characteristic kernel, and prove that KMERF kernels are asymptotically characteristic for both discrete and continuous data. Because KMERF is data-adaptive, we suspected it would outperform kernels selected a priori on finite sample data. We illustrate that KMERF nearly dominates current state-of-the-art kernel-based tests across a diverse range of high-dimensional two-sample and independence testing settings. Furthermore, our forest-based approach is interpretable, and provides feature importance metrics that readily distinguish important dimensions, unlike other high-dimensional non-parametric testing procedures. Hence, this work demonstrates the decision forest-based kernel can be more powerful and more interpretable than existing methods, flying in the face of conventional wisdom of the trade-off between the two.
    Patch-level Neighborhood Interpolation: A General and Effective Graph-based Regularization Strategy. (arXiv:1911.09307v2 [cs.LG] UPDATED)
    Regularization plays a crucial role in machine learning models, especially for deep neural networks. The existing regularization techniques mainly rely on the i.i.d. assumption and only consider the knowledge from the current sample, without the leverage of the neighboring relationship between samples. In this work, we propose a general regularizer called \textbf{Patch-level Neighborhood Interpolation~(Pani)} that conducts a non-local representation in the computation of networks. Our proposal explicitly constructs patch-level graphs in different layers and then linearly interpolates neighborhood patch features, serving as a general and effective regularization strategy. Further, we customize our approach into two kinds of popular regularization methods, namely Virtual Adversarial Training (VAT) and MixUp as well as its variants. The first derived \textbf{Pani VAT} presents a novel way to construct non-local adversarial smoothness by employing patch-level interpolated perturbations. The second derived \textbf{Pani MixUp} method extends the MixUp, and achieves superiority over MixUp and competitive performance over state-of-the-art variants of MixUp method with a significant advantage in computational efficiency. Extensive experiments have verified the effectiveness of our Pani approach in both supervised and semi-supervised settings.
    Nonparametric plug-in classifier for multiclass classification of S.D.E. paths. (arXiv:2212.10259v2 [math.ST] UPDATED)
    We study the multiclass classification problem where the features come from the mixture of time-homogeneous diffusions. Specifically, the classes are discriminated by their drift functions while the diffusion coefficient is common to all classes and unknown. In this framework, we build a plug-in classifier which relies on nonparametric estimators of the drift and diffusion functions. We first establish the consistency of our classification procedure under mild assumptions and then provide rates of cnvergence under different set of assumptions. Finally, a numerical study supports our theoretical findings.
    Asset Bundling for Wind Power Forecasting. (arXiv:2309.16492v1 [stat.ME])
    The growing penetration of intermittent, renewable generation in US power grids, especially wind and solar generation, results in increased operational uncertainty. In that context, accurate forecasts are critical, especially for wind generation, which exhibits large variability and is historically harder to predict. To overcome this challenge, this work proposes a novel Bundle-Predict-Reconcile (BPR) framework that integrates asset bundling, machine learning, and forecast reconciliation techniques. The BPR framework first learns an intermediate hierarchy level (the bundles), then predicts wind power at the asset, bundle, and fleet level, and finally reconciles all forecasts to ensure consistency. This approach effectively introduces an auxiliary learning task (predicting the bundle-level time series) to help the main learning tasks. The paper also introduces new asset-bundling criteria that capture the spatio-temporal dynamics of wind power time series. Extensive numerical experiments are conducted on an industry-size dataset of 283 wind farms in the MISO footprint. The experiments consider short-term and day-ahead forecasts, and evaluates a large variety of forecasting models that include weather predictions as covariates. The results demonstrate the benefits of BPR, which consistently and significantly improves forecast accuracy over baselines, especially at the fleet level.
    High-dimensional robust regression under heavy-tailed data: Asymptotics and Universality. (arXiv:2309.16476v1 [math.ST])
    We investigate the high-dimensional properties of robust regression estimators in the presence of heavy-tailed contamination of both the covariates and response functions. In particular, we provide a sharp asymptotic characterisation of M-estimators trained on a family of elliptical covariate and noise data distributions including cases where second and higher moments do not exist. We show that, despite being consistent, the Huber loss with optimally tuned location parameter $\delta$ is suboptimal in the high-dimensional regime in the presence of heavy-tailed noise, highlighting the necessity of further regularisation to achieve optimal performance. This result also uncovers the existence of a curious transition in $\delta$ as a function of the sample complexity and contamination. Moreover, we derive the decay rates for the excess risk of ridge regression. We show that, while it is both optimal and universal for noise distributions with finite second moment, its decay rate can be considerably faster when the covariates' second moment does not exist. Finally, we show that our formulas readily generalise to a richer family of models and data distributions, such as generalised linear estimation with arbitrary convex regularisation trained on mixture models.
    Flexible and efficient spatial extremes emulation via variational autoencoders. (arXiv:2307.08079v2 [stat.ML] UPDATED)
    Many real-world processes have complex tail dependence structures that cannot be characterized using classical Gaussian processes. More flexible spatial extremes models exhibit appealing extremal dependence properties but are often exceedingly prohibitive to fit and simulate from in high dimensions. In this paper, we develop a new spatial extremes model that has flexible and non-stationary dependence properties, and we integrate it in the encoding-decoding structure of a variational autoencoder (XVAE), whose parameters are estimated via variational Bayes combined with deep learning. The XVAE can be used as a spatio-temporal emulator that characterizes the distribution of potential mechanistic model output states and produces outputs that have the same statistical properties as the inputs, especially in the tail. As an aside, our approach also provides a novel way of making fast inference with complex extreme-value processes. Through extensive simulation studies, we show that our XVAE is substantially more time-efficient than traditional Bayesian inference while also outperforming many spatial extremes models with a stationary dependence structure. To further demonstrate the computational power of the XVAE, we analyze a high-resolution satellite-derived dataset of sea surface temperature in the Red Sea, which includes 30 years of daily measurements at 16703 grid cells. We find that the extremal dependence strength is weaker in the interior of Red Sea and it has decreased slightly over time.
    Unsupervised Fact Verification by Language Model Distillation. (arXiv:2309.16540v1 [cs.CL])
    Unsupervised fact verification aims to verify a claim using evidence from a trustworthy knowledge base without any kind of data annotation. To address this challenge, algorithms must produce features for every claim that are both semantically meaningful, and compact enough to find a semantic alignment with the source information. In contrast to previous work, which tackled the alignment problem by learning over annotated corpora of claims and their corresponding labels, we propose SFAVEL (Self-supervised Fact Verification via Language Model Distillation), a novel unsupervised framework that leverages pre-trained language models to distil self-supervised features into high-quality claim-fact alignments without the need for annotations. This is enabled by a novel contrastive loss function that encourages features to attain high-quality claim and evidence alignments whilst preserving the semantic relationships across the corpora. Notably, we present results that achieve a new state-of-the-art on the standard FEVER fact verification benchmark (+8% accuracy) with linear evaluation.  ( 2 min )
    Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit. (arXiv:2309.16620v1 [stat.ML])
    The cost of hyperparameter tuning in deep learning has been rising with model sizes, prompting practitioners to find new tuning methods using a proxy of smaller networks. One such proposal uses $\mu$P parameterized networks, where the optimal hyperparameters for small width networks transfer to networks with arbitrarily large width. However, in this scheme, hyperparameters do not transfer across depths. As a remedy, we study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization. We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, our empirical findings are supported and motivated by theory. Using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.
    Transport map unadjusted Langevin algorithms: learning and discretizing perturbed samplers. (arXiv:2302.07227v3 [stat.ME] UPDATED)
    Langevin dynamics are widely used in sampling high-dimensional, non-Gaussian distributions whose densities are known up to a normalizing constant. In particular, there is strong interest in unadjusted Langevin algorithms (ULA), which directly discretize Langevin dynamics to estimate expectations over the target distribution. We study the use of transport maps that approximately normalize a target distribution as a way to precondition and accelerate the convergence of Langevin dynamics. We show that in continuous time, when a transport map is applied to Langevin dynamics, the result is a Riemannian manifold Langevin dynamics (RMLD) with metric defined by the transport map. We also show that applying a transport map to an irreversibly-perturbed ULA results in a geometry-informed irreversible perturbation (GiIrr) of the original dynamics. These connections suggest more systematic ways of learning metrics and perturbations, and also yield alternative discretizations of the RMLD described by the map, which we study. Under appropriate conditions, these discretized processes can be endowed with non-asymptotic bounds describing convergence to the target distribution in 2-Wasserstein distance. Illustrative numerical results complement our theoretical claims.
    A parsimonious, computationally efficient machine learning method for spatial regression. (arXiv:2309.16448v1 [stat.ML])
    We introduce the modified planar rotator method (MPRS), a physically inspired machine learning method for spatial/temporal regression. MPRS is a non-parametric model which incorporates spatial or temporal correlations via short-range, distance-dependent ``interactions'' without assuming a specific form for the underlying probability distribution. Predictions are obtained by means of a fully autonomous learning algorithm which employs equilibrium conditional Monte Carlo simulations. MPRS is able to handle scattered data and arbitrary spatial dimensions. We report tests on various synthetic and real-word data in one, two and three dimensions which demonstrate that the MPRS prediction performance (without parameter tuning) is competitive with standard interpolation methods such as ordinary kriging and inverse distance weighting. In particular, MPRS is a particularly effective gap-filling method for rough and non-Gaussian data (e.g., daily precipitation time series). MPRS shows superior computational efficiency and scalability for large samples. Massive data sets involving millions of nodes can be processed in a few seconds on a standard personal computer.  ( 2 min )
    A framework for paired-sample hypothesis testing for high-dimensional data. (arXiv:2309.16274v1 [stat.ML])
    The standard paired-sample testing approach in the multidimensional setting applies multiple univariate tests on the individual features, followed by p-value adjustments. Such an approach suffers when the data carry numerous features. A number of studies have shown that classification accuracy can be seen as a proxy for two-sample testing. However, neither theoretical foundations nor practical recipes have been proposed so far on how this strategy could be extended to multidimensional paired-sample testing. In this work, we put forward the idea that scoring functions can be produced by the decision rules defined by the perpendicular bisecting hyperplanes of the line segments connecting each pair of instances. Then, the optimal scoring function can be obtained by the pseudomedian of those rules, which we estimate by extending naturally the Hodges-Lehmann estimator. We accordingly propose a framework of a two-step testing procedure. First, we estimate the bisecting hyperplanes for each pair of instances and an aggregated rule derived through the Hodges-Lehmann estimator. The paired samples are scored by this aggregated rule to produce a unidimensional representation. Second, we perform a Wilcoxon signed-rank test on the obtained representation. Our experiments indicate that our approach has substantial performance gains in testing accuracy compared to the traditional multivariate and multiple testing, while at the same time estimates each feature's contribution to the final result.  ( 2 min )
    From Complexity to Clarity: Analytical Expressions of Deep Neural Network Weights via Clifford's Geometric Algebra and Convexity. (arXiv:2309.16512v1 [cs.LG])
    In this paper, we introduce a novel analysis of neural networks based on geometric (Clifford) algebra and convex optimization. We show that optimal weights of deep ReLU neural networks are given by the wedge product of training samples when trained with standard regularized loss. Furthermore, the training problem reduces to convex optimization over wedge product features, which encode the geometric structure of the training dataset. This structure is given in terms of signed volumes of triangles and parallelotopes generated by data vectors. The convex problem finds a small subset of samples via $\ell_1$ regularization to discover only relevant wedge product features. Our analysis provides a novel perspective on the inner workings of deep neural networks and sheds light on the role of the hidden layers.  ( 2 min )
    Computational Lower Bounds for Graphon Estimation via Low-degree Polynomials. (arXiv:2308.15728v2 [math.ST] UPDATED)
    Graphon estimation has been one of the most fundamental problems in network analysis and has received considerable attention in the past decade. From the statistical perspective, the minimax error rate of graphon estimation has been established by Gao et al (2015) for both stochastic block model (SBM) and nonparametric graphon estimation. The statistical optimal estimators are based on constrained least squares and have computational complexity exponential in the dimension. From the computational perspective, the best-known polynomial-time estimator is based on universal singular value thresholding (USVT), but it can only achieve a much slower estimation error rate than the minimax one. It is natural to wonder if such a gap is essential. The computational optimality of the USVT or the existence of a computational barrier in graphon estimation has been a long-standing open problem. In this work, we take the first step towards it and provide rigorous evidence for the computational barrier in graphon estimation via low-degree polynomials. Specifically, in both SBM and nonparametric graphon estimation, we show that for low-degree polynomial estimators, their estimation error rates cannot be significantly better than that of the USVT under a wide range of parameter regimes. Our results are proved based on the recent development of low-degree polynomials by Schramm and Wein (2022), while we overcome a few key challenges in applying it to the general graphon estimation problem. By leveraging our main results, we also provide a computational lower bound on the clustering error for community detection in SBM with a growing number of communities and this yields a new piece of evidence for the conjectured Kesten-Stigum threshold for efficient community recovery.
    A Primer on Bayesian Neural Networks: Review and Debates. (arXiv:2309.16314v1 [stat.ML])
    Neural networks have achieved remarkable performance across various problem domains, but their widespread applicability is hindered by inherent limitations such as overconfidence in predictions, lack of interpretability, and vulnerability to adversarial attacks. To address these challenges, Bayesian neural networks (BNNs) have emerged as a compelling extension of conventional neural networks, integrating uncertainty estimation into their predictive capabilities. This comprehensive primer presents a systematic introduction to the fundamental concepts of neural networks and Bayesian inference, elucidating their synergistic integration for the development of BNNs. The target audience comprises statisticians with a potential background in Bayesian methods but lacking deep learning expertise, as well as machine learners proficient in deep neural networks but with limited exposure to Bayesian statistics. We provide an overview of commonly employed priors, examining their impact on model behavior and performance. Additionally, we delve into the practical considerations associated with training and inference in BNNs. Furthermore, we explore advanced topics within the realm of BNN research, acknowledging the existence of ongoing debates and controversies. By offering insights into cutting-edge developments, this primer not only equips researchers and practitioners with a solid foundation in BNNs, but also illuminates the potential applications of this dynamic field. As a valuable resource, it fosters an understanding of BNNs and their promising prospects, facilitating further advancements in the pursuit of knowledge and innovation.
    Generative Semi-supervised Learning with Meta-Optimized Synthetic Samples. (arXiv:2309.16143v1 [cs.LG])
    Semi-supervised learning (SSL) is a promising approach for training deep classification models using labeled and unlabeled datasets. However, existing SSL methods rely on a large unlabeled dataset, which may not always be available in many real-world applications due to legal constraints (e.g., GDPR). In this paper, we investigate the research question: Can we train SSL models without real unlabeled datasets? Instead of using real unlabeled datasets, we propose an SSL method using synthetic datasets generated from generative foundation models trained on datasets containing millions of samples in diverse domains (e.g., ImageNet). Our main concepts are identifying synthetic samples that emulate unlabeled samples from generative foundation models and training classifiers using these synthetic samples. To achieve this, our method is formulated as an alternating optimization problem: (i) meta-learning of generative foundation models and (ii) SSL of classifiers using real labeled and synthetic unlabeled samples. For (i), we propose a meta-learning objective that optimizes latent variables to generate samples that resemble real labeled samples and minimize the validation loss. For (ii), we propose a simple unsupervised loss function that regularizes the feature extractors of classifiers to maximize the performance improvement obtained from synthetic samples. We confirm that our method outperforms baselines using generative foundation models on SSL. We also demonstrate that our methods outperform SSL using real unlabeled datasets in scenarios with extremely small amounts of labeled datasets. This suggests that synthetic samples have the potential to provide improvement gains more efficiently than real unlabeled data.  ( 3 min )
    Stackelberg Batch Policy Learning. (arXiv:2309.16188v1 [stat.ML])
    Batch reinforcement learning (RL) defines the task of learning from a fixed batch of data lacking exhaustive exploration. Worst-case optimality algorithms, which calibrate a value-function model class from logged experience and perform some type of pessimistic evaluation under the learned model, have emerged as a promising paradigm for batch RL. However, contemporary works on this stream have commonly overlooked the hierarchical decision-making structure hidden in the optimization landscape. In this paper, we adopt a game-theoretical viewpoint and model the policy learning diagram as a two-player general-sum game with a leader-follower structure. We propose a novel stochastic gradient-based learning algorithm: StackelbergLearner, in which the leader player updates according to the total derivative of its objective instead of the usual individual gradient, and the follower player makes individual updates and ensures transition-consistent pessimistic reasoning. The derived learning dynamic naturally lends StackelbergLearner to a game-theoretic interpretation and provides a convergence guarantee to differentiable Stackelberg equilibria. From a theoretical standpoint, we provide instance-dependent regret bounds with general function approximation, which shows that our algorithm can learn a best-effort policy that is able to compete against any comparator policy that is covered by batch data. Notably, our theoretical regret guarantees only require realizability without any data coverage and strong function approximation conditions, e.g., Bellman closedness, which is in contrast to prior works lacking such guarantees. Through comprehensive experiments, we find that our algorithm consistently performs as well or better as compared to state-of-the-art methods in batch RL benchmark and real-world datasets.
    Feature Normalization Prevents Collapse of Non-contrastive Learning Dynamics. (arXiv:2309.16109v1 [cs.LG])
    Contrastive learning is a self-supervised representation learning framework, where two positive views generated through data augmentation are made similar by an attraction force in a data representation space, while a repulsive force makes them far from negative examples. Non-contrastive learning, represented by BYOL and SimSiam, further gets rid of negative examples and improves computational efficiency. While learned representations may collapse into a single point due to the lack of the repulsive force at first sight, Tian et al. (2021) revealed through the learning dynamics analysis that the representations can avoid collapse if data augmentation is sufficiently stronger than regularization. However, their analysis does not take into account commonly-used feature normalization, a normalizer before measuring the similarity of representations, and hence excessively strong regularization may collapse the dynamics, which is an unnatural behavior under the presence of feature normalization. Therefore, we extend the previous theory based on the L2 loss by considering the cosine loss, which involves feature normalization. We show that the cosine loss induces sixth-order dynamics (while the L2 loss induces a third-order one), in which a stable equilibrium dynamically emerges even if there are only collapsed solutions with given initial parameters. Thus, we offer a new understanding that feature normalization plays an important role in robustly preventing the dynamics collapse.
    Generating Personalized Insulin Treatments Strategies with Deep Conditional Generative Time Series Models. (arXiv:2309.16521v1 [stat.ML])
    We propose a novel framework that combines deep generative time series models with decision theory for generating personalized treatment strategies. It leverages historical patient trajectory data to jointly learn the generation of realistic personalized treatment and future outcome trajectories through deep generative time series models. In particular, our framework enables the generation of novel multivariate treatment strategies tailored to the personalized patient history and trained for optimal expected future outcomes based on conditional expected utility maximization. We demonstrate our framework by generating personalized insulin treatment strategies and blood glucose predictions for hospitalized diabetes patients, showcasing the potential of our approach for generating improved personalized treatment strategies. Keywords: deep generative model, probabilistic decision support, personalized treatment generation, insulin and blood glucose prediction  ( 2 min )
    Bridging Trustworthiness and Open-World Learning: An Exploratory Neural Approach for Enhancing Interpretability, Generalization, and Robustness. (arXiv:2308.03666v3 [stat.ML] UPDATED)
    As researchers strive to narrow the gap between machine intelligence and human through the development of artificial intelligence technologies, it is imperative that we recognize the critical importance of trustworthiness in open-world, which has become ubiquitous in all aspects of daily life for everyone. However, several challenges may create a crisis of trust in current artificial intelligence systems that need to be bridged: 1) Insufficient explanation of predictive results; 2) Inadequate generalization for learning models; 3) Poor adaptability to uncertain environments. Consequently, we explore a neural program to bridge trustworthiness and open-world learning, extending from single-modal to multi-modal scenarios for readers. 1) To enhance design-level interpretability, we first customize trustworthy networks with specific physical meanings; 2) We then design environmental well-being task-interfaces via flexible learning regularizers for improving the generalization of trustworthy learning; 3) We propose to increase the robustness of trustworthy learning by integrating open-world recognition losses with agent mechanisms. Eventually, we enhance various trustworthy properties through the establishment of design-level explainability, environmental well-being task-interfaces and open-world recognition programs. These designed open-world protocols are applicable across a wide range of surroundings, under open-world multimedia recognition scenarios with significant performance improvements observed.
    Horospherical Decision Boundaries for Large Margin Classification in Hyperbolic Space. (arXiv:2302.06807v3 [stat.ML] UPDATED)
    Hyperbolic spaces have been quite popular in the recent past for representing hierarchically organized data. Further, several classification algorithms for data in these spaces have been proposed in the literature. These algorithms mainly use either hyperplanes or geodesics for decision boundaries in a large margin classifiers setting leading to a non-convex optimization problem. In this paper, we propose a novel large margin classifier based on horospherical decision boundaries that leads to a geodesically convex optimization problem that can be optimized using any Riemannian gradient descent technique guaranteeing a globally optimal solution. We present several experiments depicting the competitive performance of our classifier in comparison to SOTA.
    Exploiting Edge Features in Graphs with Fused Network Gromov-Wasserstein Distance. (arXiv:2309.16604v1 [stat.ML])
    Pairwise comparison of graphs is key to many applications in Machine learning ranging from clustering, kernel-based classification/regression and more recently supervised graph prediction. Distances between graphs usually rely on informative representations of these structured objects such as bag of substructures or other graph embeddings. A recently popular solution consists in representing graphs as metric measure spaces, allowing to successfully leverage Optimal Transport, which provides meaningful distances allowing to compare them: the Gromov-Wasserstein distances. However, this family of distances overlooks edge attributes, which are essential for many structured objects. In this work, we introduce an extension of Gromov-Wasserstein distance for comparing graphs whose both nodes and edges have features. We propose novel algorithms for distance and barycenter computation. We empirically show the effectiveness of the novel distance in learning tasks where graphs occur in either input space or output space, such as classification and graph prediction.
    Lossless Transformations and Excess Risk Bounds in Statistical Inference. (arXiv:2307.16735v2 [cs.IT] UPDATED)
    We study the excess minimum risk in statistical inference, defined as the difference between the minimum expected loss in estimating a random variable from an observed feature vector and the minimum expected loss in estimating the same random variable from a transformation (statistic) of the feature vector. After characterizing lossless transformations, i.e., transformations for which the excess risk is zero for all loss functions, we construct a partitioning test statistic for the hypothesis that a given transformation is lossless and show that for i.i.d. data the test is strongly consistent. More generally, we develop information-theoretic upper bounds on the excess risk that uniformly hold over fairly general classes of loss functions. Based on these bounds, we introduce the notion of a delta-lossless transformation and give sufficient conditions for a given transformation to be universally delta-lossless. Applications to classification, nonparametric regression, portfolio strategies, information bottleneck, and deep learning, are also surveyed.
    Dynamic Selection in Algorithmic Decision-making. (arXiv:2108.12547v3 [econ.EM] UPDATED)
    This paper identifies and addresses dynamic selection problems in online learning algorithms with endogenous data. In a contextual multi-armed bandit model, a novel bias (self-fulfilling bias) arises because the endogeneity of the data influences the choices of decisions, affecting the distribution of future data to be collected and analyzed. We propose an instrumental-variable-based algorithm to correct for the bias. It obtains true parameter values and attains low (logarithmic-like) regret levels. We also prove a central limit theorem for statistical inference. To establish the theoretical properties, we develop a general technique that untangles the interdependence between data and actions.
    Data Augmentation in the Underparameterized and Overparameterized Regimes. (arXiv:2202.09134v3 [cs.LG] UPDATED)
    We provide results that exactly quantify how data augmentation affects the variance and limiting distribution of estimates, and analyze several specific models in detail. The results confirm some observations made in machine learning practice, but also lead to unexpected findings: Data augmentation may increase rather than decrease the uncertainty of estimates, such as the empirical prediction risk. It can act as a regularizer, but fails to do so in certain high-dimensional problems, and it may shift the double-descent peak of an empirical risk. Overall, the analysis shows that several properties data augmentation has been attributed with are not either true or false, but rather depend on a combination of factors -- notably the data distribution, the properties of the estimator, and the interplay of sample size, number of augmentations, and dimension. Our main theoretical tool is a limit theorem for functions of randomly transformed, high-dimensional random vectors. The proof draws on work in probability on noise stability of functions of many variables.
    Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints. (arXiv:2309.16240v1 [cs.LG])
    The increasing capabilities of large language models (LLMs) raise opportunities for artificial general intelligence but concurrently amplify safety concerns, such as potential misuse of AI systems, necessitating effective AI alignment. Reinforcement Learning from Human Feedback (RLHF) has emerged as a promising pathway towards AI alignment but brings forth challenges due to its complexity and dependence on a separate reward model. Direct Preference Optimization (DPO) has been proposed as an alternative, and it remains equivalent to RLHF under the reverse KL regularization constraint. This paper presents $f$-DPO, a generalized approach to DPO by incorporating diverse divergence constraints. We show that under certain $f$-divergences, including Jensen-Shannon divergence, forward KL divergences and $\alpha$-divergences, the complex relationship between the reward and optimal policy can also be simplified by addressing the Karush-Kuhn-Tucker conditions. This eliminates the need for estimating the normalizing constant in the Bradley-Terry model and enables a tractable mapping between the reward function and the optimal policy. Our approach optimizes LLMs to align with human preferences in a more efficient and supervised manner under a broad set of divergence constraints. Empirically, adopting these divergences ensures a balance between alignment performance and generation diversity. Importantly, $f$-DPO outperforms PPO-based methods in divergence efficiency, and divergence constraints directly influence expected calibration error (ECE).
    M-OFDFT: Overcoming the Barrier of Orbital-Free Density Functional Theory for Molecular Systems Using Deep Learning. (arXiv:2309.16578v1 [stat.ML])
    Orbital-free density functional theory (OFDFT) is a quantum chemistry formulation that has a lower cost scaling than the prevailing Kohn-Sham DFT, which is increasingly desired for contemporary molecular research. However, its accuracy is limited by the kinetic energy density functional, which is notoriously hard to approximate for non-periodic molecular systems. In this work, we propose M-OFDFT, an OFDFT approach capable of solving molecular systems using a deep-learning functional model. We build the essential nonlocality into the model, which is made affordable by the concise density representation as expansion coefficients under an atomic basis. With techniques to address unconventional learning challenges therein, M-OFDFT achieves a comparable accuracy with Kohn-Sham DFT on a wide range of molecules untouched by OFDFT before. More attractively, M-OFDFT extrapolates well to molecules much larger than those in training, which unleashes the appealing scaling for studying large molecules including proteins, representing an advancement of the accuracy-efficiency trade-off frontier in quantum chemistry.
    Smooth Nested Simulation: Bridging Cubic and Square Root Convergence Rates in High Dimensions. (arXiv:2201.02958v5 [stat.ME] UPDATED)
    Nested simulation concerns estimating functionals of a conditional expectation via simulation. In this paper, we propose a new method based on kernel ridge regression to exploit the smoothness of the conditional expectation as a function of the multidimensional conditioning variable. Asymptotic analysis shows that the proposed method can effectively alleviate the curse of dimensionality on the convergence rate as the simulation budget increases, provided that the conditional expectation is sufficiently smooth. The smoothness bridges the gap between the cubic root convergence rate (that is, the optimal rate for the standard nested simulation) and the square root convergence rate (that is, the canonical rate for the standard Monte Carlo simulation). We demonstrate the performance of the proposed method via numerical examples from portfolio risk management and input uncertainty quantification.
    Is My Prediction Arbitrary? Confounding Effects of Variance in Fair Classification. (arXiv:2301.11562v5 [cs.LG] UPDATED)
    Variance in predictions across different trained models is a significant, under-explored source of error in fair classification. In practice, the variance on some data examples is so large that decisions can be effectively arbitrary. To investigate this problem, we take an experimental approach and make four overarching contributions: We 1) Define a metric called self-consistency, derived from variance, which we use as a proxy for measuring and reducing arbitrariness; 2) Develop an ensembling algorithm that abstains from classification when a prediction would be arbitrary; 3) Conduct the largest to-date empirical study of the role of variance (vis-a-vis self-consistency and arbitrariness) in fair classification; and, 4) Release a toolkit that makes the US Home Mortgage Disclosure Act (HMDA) datasets easily usable for future research. Altogether, our experiments reveal shocking insights about the reliability of conclusions on benchmark datasets. Most fairness classification benchmarks are close-to-fair when taking into account the amount of arbitrariness present in predictions -- before we even try to apply common fairness interventions. This finding calls into question the practical utility of common algorithmic fairness methods, and in turn suggests that we should fundamentally reconsider how we choose to measure fairness in machine learning.
    Selective Nonparametric Regression via Testing. (arXiv:2309.16412v1 [stat.ML])
    Prediction with the possibility of abstention (or selective prediction) is an important problem for error-critical machine learning applications. While well-studied in the classification setup, selective approaches to regression are much less developed. In this work, we consider the nonparametric heteroskedastic regression problem and develop an abstention procedure via testing the hypothesis on the value of the conditional variance at a given point. Unlike existing methods, the proposed one allows to account not only for the value of the variance itself but also for the uncertainty of the corresponding variance predictor. We prove non-asymptotic bounds on the risk of the resulting estimator and show the existence of several different convergence regimes. Theoretical analysis is illustrated with a series of experiments on simulated and real-world data.
    Cross-Prediction-Powered Inference. (arXiv:2309.16598v1 [stat.ML])
    While reliable data-driven decision-making hinges on high-quality labeled data, the acquisition of quality labels often involves laborious human annotations or slow and expensive scientific measurements. Machine learning is becoming an appealing alternative as sophisticated predictive techniques are being used to quickly and cheaply produce large amounts of predicted labels; e.g., predicted protein structures are used to supplement experimentally derived structures, predictions of socioeconomic indicators from satellite imagery are used to supplement accurate survey data, and so on. Since predictions are imperfect and potentially biased, this practice brings into question the validity of downstream inferences. We introduce cross-prediction: a method for valid inference powered by machine learning. With a small labeled dataset and a large unlabeled dataset, cross-prediction imputes the missing labels via machine learning and applies a form of debiasing to remedy the prediction inaccuracies. The resulting inferences achieve the desired error probability and are more powerful than those that only leverage the labeled data. Closely related is the recent proposal of prediction-powered inference, which assumes that a good pre-trained model is already available. We show that cross-prediction is consistently more powerful than an adaptation of prediction-powered inference in which a fraction of the labeled data is split off and used to train the model. Finally, we observe that cross-prediction gives more stable conclusions than its competitors; its confidence intervals typically have significantly lower variability.
    Private Stochastic Optimization With Large Worst-Case Lipschitz Parameter: Optimal Rates for (Non-Smooth) Convex Losses and Extension to Non-Convex Losses. (arXiv:2209.07403v4 [cs.LG] UPDATED)
    We study differentially private (DP) stochastic optimization (SO) with loss functions whose worst-case Lipschitz parameter over all data points may be extremely large. To date, the vast majority of work on DP SO assumes that the loss is uniformly Lipschitz continuous over data (i.e. stochastic gradients are uniformly bounded over all data points). While this assumption is convenient, it often leads to pessimistic excess risk bounds. In many practical problems, the worst-case (uniform) Lipschitz parameter of the loss over all data points may be extremely large due to outliers. In such cases, the error bounds for DP SO, which scale with the worst-case Lipschitz parameter of the loss, are vacuous. To address these limitations, this work provides near-optimal excess risk bounds that do not depend on the uniform Lipschitz parameter of the loss. Building on a recent line of work (Wang et al., 2020; Kamath et al., 2022), we assume that stochastic gradients have bounded $k$-th order moments for some $k \geq 2$. Compared with works on uniformly Lipschitz DP SO, our excess risk scales with the $k$-th moment bound instead of the uniform Lipschitz parameter of the loss, allowing for significantly faster rates in the presence of outliers and/or heavy-tailed data. For convex and strongly convex loss functions, we provide the first asymptotically optimal excess risk bounds (up to a logarithmic factor). In contrast to (Wang et al., 2020; Kamath et al., 2022), our bounds do not require the loss function to be differentiable/smooth. We also devise a linear-time algorithm for smooth losses that has excess risk that is tight in certain practical parameter regimes. Additionally, our work is the first to address non-convex non-uniformly Lipschitz loss functions satisfying the Proximal-PL inequality; this covers some practical machine learning models. Our Proximal-PL algorithm has near-optimal excess risk.
    CRIMED: Lower and Upper Bounds on Regret for Bandits with Unbounded Stochastic Corruption. (arXiv:2309.16563v1 [stat.ML])
    We investigate the regret-minimisation problem in a multi-armed bandit setting with arbitrary corruptions. Similar to the classical setup, the agent receives rewards generated independently from the distribution of the arm chosen at each time. However, these rewards are not directly observed. Instead, with a fixed $\varepsilon\in (0,\frac{1}{2})$, the agent observes a sample from the chosen arm's distribution with probability $1-\varepsilon$, or from an arbitrary corruption distribution with probability $\varepsilon$. Importantly, we impose no assumptions on these corruption distributions, which can be unbounded. In this setting, accommodating potentially unbounded corruptions, we establish a problem-dependent lower bound on regret for a given family of arm distributions. We introduce CRIMED, an asymptotically-optimal algorithm that achieves the exact lower bound on regret for bandits with Gaussian distributions with known variance. Additionally, we provide a finite-sample analysis of CRIMED's regret performance. Notably, CRIMED can effectively handle corruptions with $\varepsilon$ values as high as $\frac{1}{2}$. Furthermore, we develop a tight concentration result for medians in the presence of arbitrary corruptions, even with $\varepsilon$ values up to $\frac{1}{2}$, which may be of independent interest. We also discuss an extension of the algorithm for handling misspecification in Gaussian model.
    Constructing Synthetic Treatment Groups without the Mean Exchangeability Assumption. (arXiv:2309.16409v1 [stat.ML])
    The purpose of this work is to transport the information from multiple randomized controlled trials to the target population where we only have the control group data. Previous works rely critically on the mean exchangeability assumption. However, as pointed out by many current studies, the mean exchangeability assumption might be violated. Motivated by the synthetic control method, we construct a synthetic treatment group for the target population by a weighted mixture of treatment groups of source populations. We estimate the weights by minimizing the conditional maximum mean discrepancy between the weighted control groups of source populations and the target population. We establish the asymptotic normality of the synthetic treatment group estimator based on the sieve semiparametric theory. Our method can serve as a novel complementary approach when the mean exchangeability assumption is violated. Experiments are conducted on synthetic and real-world datasets to demonstrate the effectiveness of our methods.  ( 2 min )
    HyperBO+: Pre-training a universal prior for Bayesian optimization with hierarchical Gaussian processes. (arXiv:2212.10538v2 [cs.LG] UPDATED)
    Bayesian optimization (BO), while proved highly effective for many black-box function optimization tasks, requires practitioners to carefully select priors that well model their functions of interest. Rather than specifying by hand, researchers have investigated transfer learning based methods to automatically learn the priors, e.g. multi-task BO (Swersky et al., 2013), few-shot BO (Wistuba and Grabocka, 2021) and HyperBO (Wang et al., 2022). However, those prior learning methods typically assume that the input domains are the same for all tasks, weakening their ability to use observations on functions with different domains or generalize the learned priors to BO on different search spaces. In this work, we present HyperBO+: a pre-training approach for hierarchical Gaussian processes that enables the same prior to work universally for Bayesian optimization on functions with different domains. We propose a two-step pre-training method and analyze its appealing asymptotic properties and benefits to BO both theoretically and empirically. On real-world hyperparameter tuning tasks that involve multiple search spaces, we demonstrate that HyperBO+ is able to generalize to unseen search spaces and achieves lower regrets than competitive baselines.  ( 2 min )
    Transfer Learning for Bayesian Optimization on Heterogeneous Search Spaces. (arXiv:2309.16597v1 [cs.LG])
    Bayesian optimization (BO) is a popular black-box function optimization method, which makes sequential decisions based on a Bayesian model, typically a Gaussian process (GP), of the function. To ensure the quality of the model, transfer learning approaches have been developed to automatically design GP priors by learning from observations on "training" functions. These training functions are typically required to have the same domain as the "test" function (black-box function to be optimized). In this paper, we introduce MPHD, a model pre-training method on heterogeneous domains, which uses a neural net mapping from domain-specific contexts to specifications of hierarchical GPs. MPHD can be seamlessly integrated with BO to transfer knowledge across heterogeneous search spaces. Our theoretical and empirical results demonstrate the validity of MPHD and its superior performance on challenging black-box function optimization tasks.  ( 2 min )
    Nonparametric estimation of a covariate-adjusted counterfactual treatment regimen response curve. (arXiv:2309.16099v1 [math.ST])
    Flexible estimation of the mean outcome under a treatment regimen (i.e., value function) is the key step toward personalized medicine. We define our target parameter as a conditional value function given a set of baseline covariates which we refer to as a stratum based value function. We focus on semiparametric class of decision rules and propose a sieve based nonparametric covariate adjusted regimen-response curve estimator within that class. Our work contributes in several ways. First, we propose an inverse probability weighted nonparametrically efficient estimator of the smoothed regimen-response curve function. We show that asymptotic linearity is achieved when the nuisance functions are undersmoothed sufficiently. Asymptotic and finite sample criteria for undersmoothing are proposed. Second, using Gaussian process theory, we propose simultaneous confidence intervals for the smoothed regimen-response curve function. Third, we provide consistency and convergence rate for the optimizer of the regimen-response curve estimator; this enables us to estimate an optimal semiparametric rule. The latter is important as the optimizer corresponds with the optimal dynamic treatment regimen. Some finite-sample properties are explored with simulations.  ( 2 min )
    Improving Adaptive Online Learning Using Refined Discretization. (arXiv:2309.16044v1 [cs.LG])
    We study unconstrained Online Linear Optimization with Lipschitz losses. The goal is to simultaneously achieve ($i$) second order gradient adaptivity; and ($ii$) comparator norm adaptivity also known as "parameter freeness" in the literature. Existing regret bounds (Cutkosky and Orabona, 2018; Mhammedi and Koolen, 2020; Jacobsen and Cutkosky, 2022) have the suboptimal $O(\sqrt{V_T\log V_T})$ dependence on the gradient variance $V_T$, while the present work improves it to the optimal rate $O(\sqrt{V_T})$ using a novel continuous-time-inspired algorithm, without any impractical doubling trick. This result can be extended to the setting with unknown Lipschitz constant, eliminating the range ratio problem from prior works (Mhammedi and Koolen, 2020). Concretely, we first show that the aimed simultaneous adaptivity can be achieved fairly easily in a continuous time analogue of the problem, where the environment is modeled by an arbitrary continuous semimartingale. Then, our key innovation is a new discretization argument that preserves such adaptivity in the discrete time adversarial setting. This refines a non-gradient-adaptive discretization argument from (Harvey et al., 2023), both algorithmically and analytically, which could be of independent interest.  ( 2 min )

  • Open

    [R] Drive Like a Human: Rethinking Autonomous Driving with Large Language Models
    Paper - https://arxiv.org/abs/2307.07162 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [Research] - Resource to query ML and LLM based research
    Made a repo for you all to try using a collaborative AI tool which includes 100+ papers on LLM-Based-Agents. You can try out the repo here: https://www.collama.ai/varun/llm-based-agents submitted by /u/_llama2 [link] [comments]  ( 9 min )
    [D] Choosing the best learning model for a start up app?
    Straight off the bat: I am not very familiar but was tasked to find a suggest a reasonable model for our need. Here is a bit what I read: https://www.obviously.ai/post/how-to-choose-the-right-ai-model-for-your-application https://www.addevice.io/blog/ai-framework-for-app-development The app that I am working on is an education app, and the purpose of the AI would be to (at least in terms of priority) generate a post subject line / topic to discuss. The company is super small, so money is important. JS is being used mainly at the moment. What would be a good choice for a small start up to generate topics for an education app used by schools? At least any ideas or things to consider would be wonderful to get my rabbit hole dive started! Thanks. submitted by /u/Willy988 [link] [comments]  ( 9 min )
    [R] Gsgen: Text-to-3D using Gaussian Splatting
    Project Page Paper Code In this paper, we present Gaussian Splatting based text-to-3D generation (GSGEN), a novel approach for generating high-quality 3D objects. Previous methods suffer from inaccurate geometry and limited fidelity due to the absence of 3D prior and proper representation. We leverage 3D Gaussian Splatting, a recent state-of-the-art representation, to address existing shortcomings by exploiting the explicit nature that enables the incorporation of 3D prior. Specifically, our method adopts a progressive optimization strategy, which includes a geometry optimization stage and an appearance refinement stage. In geometry optimization, a coarse representation is established under a 3D geometry prior along with the ordinary 2D SDS loss, ensuring a sensible and 3D-consistent rough shape. Subsequently, the obtained Gaussians undergo an iterative refinement to enrich details. In this stage, we increase the number of Gaussians by compactness-based densification to enhance continuity and improve fidelity. With these designs, our approach can generate 3D content with delicate details and more accurate geometry. Extensive evaluations demonstrate the effectiveness of our method, especially for capturing high-frequency components. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [D] Does anyone else feel like MOJO isn't getting the attention it deserves?
    https://docs.modular.com/mojo/ submitted by /u/hai_cben [link] [comments]  ( 9 min )
    [P] Carton – Run any ML model from any programming language
    Hi! I just open-sourced a project that I've been working on for a while and wanted to see what you think! The goal of Carton (https://carton.run) is to let you use a single interface to run any machine learning model from any programming language. It’s currently difficult to integrate models that use different technologies (e.g. TensorRT, Ludwig, TorchScript, JAX, GGML, etc) into your application, especially if you’re not using Python. Even if you learn the details of integrating each of these frameworks, running multiple frameworks in one process can cause hard-to-debug crashes. Ideally, the ML framework a model was developed in should just be an implementation detail. Carton lets you decouple your application from specific ML frameworks so you can focus on the problem you actually want to solve. At a high level, the way Carton works is by running models in their own processes and using an IPC system to communicate back and forth with low overhead. Carton is primarily implemented in Rust, with bindings to other languages. There are lots more details linked in the architecture doc below. Importantly, Carton uses your model’s original underlying framework (e.g. PyTorch) under the hood to actually execute the model. This is meaningful because it makes Carton composable with other technologies. For example, it’s easy to use custom ops, TensorRT, etc without changes. This lets you keep up with cutting-edge advances, but decouples them from your application. I’ve been working on Carton for almost a year now and I open sourced it on Wednesday! Some useful links: Website, docs, quickstart - https://carton.run Explore existing models - https://carton.pub Repo - https://github.com/VivekPanyam/carton Architecture - https://github.com/VivekPanyam/carton/blob/main/ARCHITECTURE.md Please let me know what you think! submitted by /u/vpanyam [link] [comments]  ( 10 min )
    [P] Location Computation
    Hi Everyone, I’m doing a project where I’m crowdsourcing a lot of location data for a set of location labels and then trying to cluster it for each and using the centroid of the cluster as the most accurate location for that location label. The data keeps coming in everyday. I’m not sure when to stop computation. Initially I thought I’ll check the delta between each days centroid computed and if the delta falls under a threshold then stop computing. But now I’m thinking if my daily data collected gets marked as outliers, subsequent days centroids won’t have much of a delta and it will pass my convergence condition. Any suggestions? submitted by /u/Longjumping-Song4958 [link] [comments]  ( 9 min )
    [D][R] Deploying deep models on memory constrained devices
    Suppose we want to use a deep learning model on a gpu within our app. We want this model to coexist on the gpu with other processes, effectively limit it's possible usage of resources. As cuDNN/cuBLAS routines are nondeterministic and possibly dynamically allocate variable amount of memory, how do people manage this problem? Is it a problem at all? Estimating memory usage of deep learning models on gpu is notoriously hard. There is a research paper from Microsoft tackling this problem and they mispredict the usage of memory by 15% on average. Some cpu BLAS libraries like openBLAS or MKL also dynamically allocate the memory, but there are alternatives - LAPACK as far as I know uses only the memory provided by the caller, making it viable option for applications in embedded. In safety crit…  ( 10 min )
    [D] Best Sequence Embedding Models?
    Which are currently the best Sentence Embedding pre-trained models out there? submitted by /u/Uilxitora [link] [comments]  ( 9 min )
    [D] Using Gamification to demystify the AI black-box
    Blog about AI "black box" nature and how it can be explained and become engaging to users using gamification. Explained with example from open-appsec an open-source machine learning-based Web Application & API Security product. https://www.openappsec.io/post/using-gamification-to-demystify-the-ai-black-box-in-a-waf-product https://github.com/openappsec/openappsec submitted by /u/onirisapp [link] [comments]  ( 9 min )
    [Project] Startup Job Post/Contractor role
    Hey all! I'm in the throws of doing a startup and looking for someone to help build a legal tech platform. I can discuss more in person, but it is intended to be a human/lawyer in the loop workflow tool for complex contract and deal analysis. Base product is built and deployed. I'm a former developer turned lawyer with 15 years corporate experiences, and need help/talent/co-founder to help take things to the next level. Ideally you have a mixture of NLP and regular software engineering background and just a very practical approach. If you've played with LLM's all the better. Options for cash, equity, larger roles are all on the table. Just looking for the right talent. DM me if you are interested and lets talk about experience, etc.! And it seems that tags are turned off in here, so not sure how to tag something as [Project] but I put it in the title. submitted by /u/pudgyplacater [link] [comments]  ( 9 min )
    [R] RealFill: Reference-Driven Generation for Authentic Image Completion
    Project page: https://realfill.github.io/ Paper: https://arxiv.org/abs/2309.16668 RealFill is able to complete the image with what should have been there. Abstract Recent advances in generative imagery have brought forth outpainting and inpainting models that can produce high-quality, plausible image content in unknown regions, but the content these models hallucinate is necessarily inauthentic, since the models lack sufficient context about the true scene. In this work, we propose RealFill, a novel generative approach for image completion that fills in missing regions of an image with the content that should have been there. RealFill is a generative inpainting model that is personalized using only a few reference images of a scene. These reference images do not have to be aligned with the target image, and can be taken with drastically varying viewpoints, lighting conditions, camera apertures, or image styles. Once personalized, RealFill is able to complete a target image with visually compelling contents that are faithful to the original scene. We evaluate RealFill on a new image completion benchmark that covers a set of diverse and challenging scenarios, and find that it outperforms existing approaches by a large margin. ​ submitted by /u/StrawberryNumberNine [link] [comments]  ( 9 min )
    [R] Listen2Scene: Interactive material-aware binaural sound propagation for reconstructed 3D scenes
    https://www.youtube.com/watch?v=aNJWCwG-H_U submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [R] M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec
    Paper : https://arxiv.org/abs/2309.07416 Demo : https://anton-jeran.github.io/MAD/ Code : https://github.com/anton-jeran/MULTI-AUDIODEC submitted by /u/Snoo63916 [link] [comments]  ( 9 min )
    [R] The Future of Romance: Novel Techniques for Replacing your Boyfriend with Generative AI (parody)
    submitted by /u/TobyWasBestSpiderMan [link] [comments]  ( 9 min )
    Classical NLP course [D]
    Classical NLP course recommendation Can you recommend me NLP course that dives into classical NLP methods: For example: HMM MaxEnt CKY algo Sytactic parsing Dependency Parsing submitted by /u/Thick-brain-dude [link] [comments]  ( 9 min )
    [D] Multi-task learning leads to overfitting. Is this the double descent phenomenon?
    I have a CNN model, call it model M. It was trained on dataset A for object pose estimation. After training for 100 epochs, it resulted in these losses: Train: 0.06 Val: 0.08 As dataset A is somewhat limited, I wonder if I can incorporate additional data via a different, but related task: object segmentation for similar objects. Model M is a UNet, so I can incorporate this task simply with an additional output channel in the last layer. I add dataset B for object segmentation. During training, M learns on both datasets quite well, which suggests to me that the tasks are well-aligned. After 100 epochs, I get these losses on dataset A: Train: 0.06 Val: 0.16 This is surprising to me. If I get the same training loss on dataset A, while training on additional data. I'd expect the validation loss to be lower, since I'm training on 2x the data. Yet the validation performance is consistently higher when I train on both datasets. The only explanation I can think of is the double descent phenomenon. Perhaps when I trained only on dataset A, I was significantly over-parameterized, but past the interpolation threshold. So perhaps adding more data brought me closer to the interpolation threshold, leading to worse generalization. Does this explanation seem likely? Has anyone had similar experiences? submitted by /u/murrdpirate [link] [comments]  ( 9 min )
    [D] What's the relationship between Denoising Autoencoders and Diffusion Models?
    Hello, denoising autoencoders is when you train something to reverse x+n -> x. This seems to be basically the same as a diffusion model, more so if you see the U-Net diffusion model, which is effectively an information bottleneck. submitted by /u/windoze [link] [comments]  ( 9 min )
    [D] How is this sub not going ballistic over the recent GPT-4 Vision release?
    For a quick disclaimer, I know people on here think the sub is being flooded by people who arent ml engineers/researchers. I have worked at two FAANGS on ml research teams/platforms. My opinion is that GPT-4 Vision/Image processing is out of science fiction. I fed chatgpt an image of a complex sql data base schema, and it converted it to code, then optimized the schema. It understood the arrows pointing between table boxes on the image as relations, and even understand many to one/many to many. I took a picture of random writing on a page, and it did OCR better than has ever been possible. I was able to ask questions that required OCR and a geometrical understanding of the page layout. Where is the hype on here? This is an astounding human breakthrough. I cannot believe how much ML is now obsolete as a result. I cannot believe how many computer science breakthroughs have occurred with this simple model update. Where is the uproar on this sub? Why am I not seeing 500 comments on posts about what you can do with this now? Why are there even post submissions about anything else? submitted by /u/corporate_autist [link] [comments]  ( 9 min )
    [P] vLLM with Mistral 7B guide
    Hey all - vllm==0.2.0 got released a couple of hours ago and I put together some code to get it running with the new Mistral 7B model. Also included are some benchmarks for different input batch sizes with the model (output capped at 200 tokens): Batch size Tokens /s 1 46 10 400 60 1.8k Hope it's useful, let me know if you'd like any more info! Here's the link: https://docs.mystic.ai/docs/mistral-ai-7b-vllm-fast-inference-guide submitted by /u/paulcjh [link] [comments]  ( 9 min )
  • Open

    Bing AI chat messages are being hijacked by ads pushing malware
    Bing AI chat messages are being hijacked by ads pushing malware. Malvertising has made its way to Bing's chatbot/search engine. Cybersecurity researchers observed a malicious ad being offered as part of the Chat-GPT, AI-powered answer to a search query. Malvertising is a practice where hackers trick ad networks into displaying ads that look legitimate but are actually malicious. Microsoft integrated Chat-GPT into Bing earlier this year and started monetizing it. When a user types in a query, they would get a result paired with sponsored links. In this instance, researchers were given a link that redirected them to a malicious site. Threat actors continue to leverage search ads to redirect users to malicious sites hosting malware. Bing Chat serves some of the same ads seen via a traditional Bing query. Source : https://www.techradar.com/pro/security/bing-ai-chat-messages-are-being-hijacked-by-ads-pushing-malware submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Crafting Virtual Worlds With Just Words. How AI Changes 3D World Building Forever.
    submitted by /u/Magic-Fabric [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Meta AI presents Emu, a quality-tuned latent diffusion model for generating highly aesthetic images. Emu significantly outperforms SDXLv1.0 on visual appeal [Paper]. Meta AI researchers present a series of long-context LLMs with context windows of up to 32,768 tokens. LLAMA 2 70B variant surpasses gpt-3.5-turbo-16k’s overall performance on a suite of long-context tasks [Paper]. Abacus AI released a larger 70B version of Giraffe. Giraffe is a family of models that are finetuned from base Llama 2 and have a larger context length of 32K tokens [Details]. Meta announced [Details]: Meta AI - a new AI assistant users can interact with on WhatsApp, Messenger and Instagram. Will also be available on Ray-Ban Meta smart glasses and Quest 3, Meta’s mixed reality h…  ( 12 min )
    I Asked ChatGPT to be my Girlfriend: And it said Yes!
    This is a quick fun project, nothing serious at all, a personalized custom instruction for ChatGPT to make it conversational and maintain character throughout the interaction. If you want to learn more serious and useful prompt engineering techniques head on: r/PromptWizards. In all seriousness, even though this is just for fun, such applied prompt engineering for NPC in games, or online AI companion services can actually be relevant and useful in the future. By initializing this Girlfriend RolePlaying ChatGPT mode, you're not only interacting with an AI but with Sarah, 25, who is keen to explore several layers of a relationship with you. Each command you use brings you a different shade of companionship: /start - Let Sarah introduce herself. /chat - Engages you in a comforting and c…  ( 11 min )
    Exploring Jimmy Apples Claim: "The AGI has been achieved internally" - Detailed Reddit Investigation
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    This week's AI digest
    Planet friendly: Researchers say AI emits up to 1,500 times less CO2 than humans when producing a page of text. Some disagree. Banking: Currently, around 41% of US bank customers are digital only Meta is launching AI chatbots across its apps to retain younger users. Amazon invests $4 Billion in OpenAI competitor Anthropic. Emerging tiger: Nvidia’s CEO bets India will emerge as a major AI market. Regulation: OpenAI CEO Sam Altman advocates for AI regulation despite risks. Suspense: Elon Musk says AI image generation app Midjourney will be releasing “something significant” soon. submitted by /u/unbalanced_mind [link] [comments]  ( 9 min )
    CGPT-4, it is an indisputable fact that Trump waited three hours to end the insurrection violence. What do his supporters guess was his thinking during this time?
    Some posit he was genuinely surprised by the violence and needed time to assess the situation before acting. They argue that the chaotic nature of the events made immediate action complex, given the layers of command and decision-making involved. Others claim he might have been in deliberation with advisors to gauge the scale and implications of intervention, debating the potential backlash from his supporters or the legal ramifications. Another perspective suggests that he might have been contemplating how the events would affect the certification of the Electoral College results, given that his previous legal and political efforts to contest the 2020 election outcome had failed. In this view, he might have been waiting to see if the Congress would be influenced to halt or delay the certification. While some of his supporters may find these explanations plausible, critics argue that the delay represents a dereliction of duty or even tacit support for the violence. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    How to Connect ChatGPT to the Internet (Step-by-Step Guide)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Any "free" ai to turn text to speech?
    I am looking for an ai that will turn the text to speech and be free. submitted by /u/Korti213 [link] [comments]  ( 9 min )
    Looking for some help on a project
    Hey y’all, I’ve been seeing these clips everywhere of AI streamers, and I’ve been searching Everywhere for explanations of how to make one. I believe I understand the concepts, but I’m really at a loss for the avatar text to speech part. I believe I have it ready for collecting questions and getting it to ChatGPT for response/script, but im very stuck at using a photo for an avatar that can mouth the words and not take 3 mins per response. Any help is appreciated, I’ve been at this project for longer than I’d like lmao. The attached video is a random YouTube short for reference submitted by /u/Lipoz69 [link] [comments]  ( 9 min )
    He got Facebook hooked on AI. Now he can't fix its misinformation addiction
    Facebook's addiction to spreading misinformation and hate speech is a result of its AI algorithms. Joaquin Quiñonero Candela, a director of AI at Facebook, was tasked with fixing the problem but was only focused on addressing AI bias. The Responsible AI team failed to make headway against misinformation and hate speech because it never made those problems its main focus. The spread of lies and hate speech on Facebook has only grown, contributing to genocidal campaigns and the promotion of dangerous falsehoods. The algorithms that underpin Facebook's business were designed to maximize engagement, not filter out false or inflammatory content. Source : https://www.technologyreview.com/2021/03/11/1020600/facebook-responsible-ai-misinformation/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Album covers but Morgan Freeman
    submitted by /u/TheGhettoControversy [link] [comments]  ( 9 min )
    Google is expanding its AI-powered search experience to teenagers
    Google's AI-driven search experience, Search Generative Experience (SGE), is now accessible to teenagers between 13-17 in America. Entailments include a conversational mode for searches, which Google believes can help youngsters pose atypical questions to dig deeper. For the latest advancements in AI, look here first. Teen-friendly AI search SGE introduces a conversational mode to Google Search, allowing users to ask questions and follow-ups in a more natural language. To prevent harmful content from surfacing, Google has placed guardrails, providing stronger protections related to illegal and age-gated substances, or bullying. Features and improving AI accuracy Google is rolling out "About this result" to provide users with more context about the displayed content. Google acknowledges and addresses any validation of false or offensive claims by the AI-powered response, ensuring to provide higher quality and more accurate responses. It’s also using large language models to self-critique and rewrite draft responses on sensitive topics based on quality and safety principles. SGE's popularity and future plans Since SGE's introduction, it has found popularity, especially among younger users who prefer a conversational approach. Google plans to expand SGE outside the U.S. to India and Japan and improve its services with support for videos, images, local info, and more. It's also experimenting with ads positioned next to the AI-generated responses. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Build a crop segmentation machine learning model with Planet data and Amazon SageMaker geospatial capabilities
    In this analysis, we use a K-nearest neighbors (KNN) model to conduct crop segmentation, and we compare these results with ground truth imagery on an agricultural region. Our results reveal that the classification from the KNN model is more accurately representative of the state of the current crop field in 2017 than the ground truth classification data from 2015. These results are a testament to the power of Planet’s high-cadence geospatial imagery. Agricultural fields change often, sometimes multiple times a season, and having high-frequency satellite imagery available to observe and analyze this land can provide immense value to our understanding of agricultural land and quickly-changing environments.  ( 15 min )
  • Open

    Innovative Endeavors: Meta Introduces AI-Powered Tools and Smart Glasses
    submitted by /u/Allinhalf [link] [comments]  ( 9 min )
    Pruning a specific dimension in a neural network using L1-norm
    I've been playing around with pruning neural networks. One interesting thing I've found is that pruning the weights with the lowest L1-norm along a specific dimension seems to give better results than simply pruning all of the weights with the lowest L1-norm (which I believe is the standard method; for example this is what torch.nn.utils.prune.l1_unstructured does). Does anyone have an explanation for why this might be, or knows of any research in this area? I'm aware that structured pruning removes entire channels in a specific dimension. But I'm referring to unstructured pruning here, where I remove a subset of the weights along a specific dimension. Admittedly I've only done very limited benchmarking of this. See this repo for my implementation, and some benchmark details. submitted by /u/Neilf79 [link] [comments]  ( 9 min )
    Help understanding ai, Specificaly cnn cause i want to try training a model on mnist data set as my first project
    Hello, so i learnt the very basics of ai and im trying to understand how nn works, this is what i have figured out so far. so if i have a 4x4 image e.g 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 i pass it across a 2x2 kernal e.g 1 1 0 3 ​ and padding it would do ​ dot product of 0 1 1 0 ​ x ​ 1 1 0 3 ​ is 1 ​ ​ and if we do that to all of them we get a new matrix ​ 1 2 4 4 1 3 5 4 4 ​ ​ then we have padding same so this becomes ​ 0 0 0 0 1 2 4 0 4 1 3 0 5 4 4 0 ​ ​ we then turn it into a feature map, basically flatenting it to something like this 0,0,0,0,1,2,4,0,4,1,3,0,5,4,4,0 ​ so the input has 16 features, if we have a layer of 3 nerons that fire with relu activation funciton and each weight is alternating between 1 and 2 for simplicity sake . we would do 0*1 + 0*2 + 0*1 .... 4*2 + 0*1 = 32 so if we are using relu, we would do is 32 > 0? if so we pass 32 to next neuron if not we pass 0? ​ idk the rest, i guess i forgot what uni taught me 😅 ​ hers a diagram i drew, maybe you can help me figure out hte rest, im confused how the output layer works i guess ​ ​ ​ ​ submitted by /u/SaadPaad2003 [link] [comments]  ( 9 min )
    help understanding basics of neural networks, cnn's to be exact
    Hello, so i learnt the very basics of ai and im trying to understand how nn works, this is what i have figured out so far. so if i have a 4x4 image e.g 0 1 1 0 1 0 0 1 1 1 1 1 1 0 0 1 i pass it across a 2x2 kernal e.g 1 1 0 3 ​ and padding it would do ​ dot product of 0 1 1 0 ​ x ​ 1 1 0 3 ​ is 1 ​ ​ and if we do that to all of them we get a new matrix ​ 1 2 4 4 1 3 5 4 4 ​ ​ then we have padding same so this becomes ​ 0 0 0 0 1 2 4 0 4 1 3 0 5 4 4 0 ​ ​ we then turn it into a feature map, basically flatenting it to something like this 0,0,0,0,1,2,4,0,4,1,3,0,5,4,4,0 ​ so the input has 16 features, if we have a layer of 3 nerons that fire with relu activation funciton and each weight is alternating between 1 and 2 for simplicity sake . we would do 0*1 + 0*2 + 0*1 .... 4*2 + 0*1 = 32 so if we are using relu, we would do is 32 > 0? if so we pass 32 to next neuron if not we pass 0? ​ idk the rest, i guess i forgot what uni taught me 😅 ​ hers a diagram i drew, maybe you can help me figure out hte rest, im confused how the output layer works i guess ​ ​ https://preview.redd.it/h07o5y6847rb1.png?width=1859&format=png&auto=webp&s=df1cdf73ea64ff93ac872dfe8248722e8befd31d ​ ​ submitted by /u/WranglerParty5452 [link] [comments]  ( 9 min )
    Adapt GAN
    Hi everyone, Im new to the Neural network and I wanted some advice : I wanted to generate grayscale images with certain properties : - distribution of pixels values, space correlation between pixels, etc... I already know the type of result that I need, but I wanted to know if a neural network especially a GAN was capable to produce images fitting me requirements. I was thinking that maybe I could change the GAN architecture such as : 1)the Real data inputs (normally images feed to discriminator) will simply be the statistical parameters that I am expecting. 2) I'll add a measure of the various statistical parameters on all the synthetic images generated. 3)Finally the discriminator will only based itself on the statistical parameters comparison for weights updates. Does such network make sense ? If so I have trouble finding a way of implementing it but that is an other story. Right know I want to know if this is doable ? If not do you have any alternative suggestion for my issue ? Thanks all for your advice ! submitted by /u/Hectorite [link] [comments]  ( 9 min )
    Why Batch Norm Works
    submitted by /u/Personal-Trainer-541 [link] [comments]  ( 9 min )
  • Open

    Meta's Technological Marvel: AI-Powered Tools and Intuitive Smart Glasses
    submitted by /u/Allinhalf [link] [comments]  ( 9 min )
    Why is dyna Q not outperforming Q learning in terms of sample efficiency?
    I coded a dyna Q implementation based on the algorithm given in Sutton's book over here. However, it seems like both are equally sample efficient on the cliff walking environment. Here is my code. These are my results - ​ ​ https://preview.redd.it/z7xwow5hz7rb1.png?width=585&format=png&auto=webp&s=90b33eb4c754e199e9bf15499a78e0f42e05f5d2 The only think that came to my mind was to increase the model sampling rate (`n_iters`). Even after assigning a large value to it, the performance doesn't change. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How can I config and build MJPC c++ software?
    I'm trying to install and run this open-source project https://github.com/google-deepmind/mujoco_mpc. It's called MJPC, and it's a C++ software that displays a real-time interactive interface. I've cloned the code, installed CMake, and gcc version 13.1.0 to run C++20. I've also installed the CMake Tools and C/C++ extensions in VSCode as instructed. However, I'm not sure what to do next. I have no experience with C++ and software coding, configuring in VSCode, or building it. Please help me if you can, provide detailed guidance. submitted by /u/Nghiattk27 [link] [comments]  ( 9 min )
    LLM Agents for RL envs
    Has anyone here tried using LLM Agents to solve RL environments? I'm curious about your experiences. Considering that performing a single action involves a chain of thoughts, how fast did your experiments go? Please feel free to add any additional comments about this. Cheers! submitted by /u/stinoco [link] [comments]  ( 9 min )
    Shape Formation with Multi-Agent Reinforcement Learning
    Hey everyone, I'm trying to write MARL code with MAPPO policy to train three agents to form a triangle shape. I'm relatively new to RL, having completed the fundamentals, but I'm struggling to come up with suitable resources which can teach me how to implement codes on python. I'd be really greatful if someone could share some insights or useful resources where I can learn to code and implement MARL. submitted by /u/The_One263 [link] [comments]  ( 9 min )
    Shape Formation with Multi-Agent Reinforcement Learning
    Hey everyone, I'm trying to write MARL code with MAPPO policy to train three agents to form a triangle shape. I'm relatively new to RL, having completed the fundamentals, but I'm struggling to come up with suitable resources which can teach me how to implement codes on python. I'd be really greatful if someone could share some insights or useful resources where I can learn to code and implement MARL. submitted by /u/The_One263 [link] [comments]  ( 9 min )
    Curiosity/ Exploration with Rllib
    Hi! I’ve been training a MultiAgentEnv with Curiosity, but I’d like to extend my action space to be a Dictionary. Are there any similar modules I could use instead or is there any way to use Curiosity with a Dictionary consisting of a Box and a Discrete action space. Thank you! submitted by /u/tessherelurkingnow [link] [comments]  ( 9 min )
  • Open

    Regular solids and Monte Carlo integration
    Monte Carlo integration is not as simple in practice as it is often introduced. A homework problem might as you to integrate a function of two variables by selecting random points from a cube and counting how many of the points fall below the graph of the function. This would indeed give you an estimate […] Regular solids and Monte Carlo integration first appeared on John D. Cook.  ( 6 min )
  • Open

    Who will benefit from AI?
    In campus talk, Daron Acemoglu offers vision of “machine usefulness,” rather than autonomous “intelligence,” to help workers and spread prosperity.  ( 11 min )
  • Open

    Heeding Huang’s Law: Video Shows How Engineers Keep the Speedups Coming
    In a talk, now available online, NVIDIA Chief Scientist Bill Dally describes a tectonic shift in how computer performance gets delivered in a post-Moore’s law era. Each new processor requires ingenuity and effort inventing and validating fresh ingredients, he said in a recent keynote address at Hot Chips, an annual gathering of chip and systems Read article >  ( 6 min )

  • Open

    Cross-platform way to enter Unicode characters
    The previous post describes the hoops I jumped through to enter Unicode characters on a Mac. Here’s a script to run from the command line that will copy Unicode characters to the system clipboard. It runs anywhere the Python module pyperclip runs. #!/usr/bin/env python3 import sys import pyperclip cp = sys.argv[1] ch = eval(f"chr(0x{cp})") print(ch) […] Cross-platform way to enter Unicode characters first appeared on John D. Cook.  ( 5 min )
    Using Unicode on MacOS
    Setting up Unicode on my MacBook took some research, so I’m leaving myself a note here if I need to do it again. Maybe it’ll help someone else too. From the System Settings dialog, go to Keyboard and click the Edit button next to Input Sources. Click on the + sign in the lower left […] Using Unicode on MacOS first appeared on John D. Cook.  ( 5 min )
  • Open

    The Creator (2023) movie discussion
    In theaters now. PG-13. Synopsis from Fandango (mild spoilers) From writer/director Gareth Edwards (“Rogue One,” “Godzilla”) comes an epic sci-fi action thriller set amidst a future war between the human race and the forces of artificial intelligence. Joshua (John David Washington, "Tenet"), a hardened ex-special forces agent grieving the disappearance of his wife (Gemma Chan, "Eternals"), is recruited to hunt down and kill the Creator, the elusive architect of advanced AI who has developed a mysterious weapon with the power to end the war… and mankind itself. Joshua and his team of elite operatives journey across enemy lines, into the dark heart of AI-occupied territory… only to discover the world-ending weapon he’s been instructed to destroy is an AI in the form of a young child (newcomer Madeleine Yuna Voyles). Trailer If there is any other media I should make threads for just let me know- could be video games, television, etc. submitted by /u/jaketocake [link] [comments]  ( 9 min )
    Aryn comes out of stealth to bring GenAI to OpenSearch and data preparation
    Aryn, a team with experience in AWS big data and database services, has come out of stealth and raised $7.5M in series seed funding. Their mission is to bring generative AI to OpenSearch and data preparation. They aim to use generative AI models to process unstructured data for tasks such as information extraction, question-answering, summarization, and content generation. Aryn's conversational search approach empowers users to interact with their unstructured enterprise data. They have developed a conversational search stack consisting of a semantic data preparation system called Sycamore, semantic search with OpenSearch, and conversational capabilities in OpenSearch. Generative AI powers each component of the stack, leading to higher quality answers and ease of use. Developers can quickly build and deploy applications like question-answering, chatbots, and research platforms using Aryn's stack without needing expertise in AI and search. Aryn's stack is 100% open source, making it accessible to developers. Source : https://blog.aryn.ai/aryn-bringing-generative-ai-to-opensearch-and-data-preparation submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Why does this read like someone used chatdev and gave it a marketing agent named clover with access to a reddit account?
    submitted by /u/Lesbianseagullman [link] [comments]  ( 9 min )
    Meta Unfolds a 'Universe of AI' Across Instagram, Facebook, and WhatsApp
    Meta has unveiled colossal AI updates peppered across its platform that would fundamentally alter user experiences on Instagram, Facebook, and WhatsApp, opening up a "universe of AI" solutions. For the latest advancements in AI, look here first. https://preview.redd.it/bl81rlbqp1rb1.png?width=2048&format=png&auto=webp&s=be44b8ebae8f65b53eb82fe2a78b45f19260c452 Spearheading the AI Universe - Meta AI Chatbot The “advanced conversational assistant” is set to enhance Messenger, WhatsApp, and Instagram services and will be incorporated into upcoming Ray-Ban Meta smart glasses and Quest 3. Real-time information capabilities have been bolstered through a partnership with Microsoft Bing, and image generation is powered by a new model, Emu. A Galaxy of AI Personalities Meta rolled out 28 AIs in beta, featuring sterling personas such as Snoop Dogg, Tom Brady, Kendall Jenner, and Naomi Osaka, thus amplifying the interactivity quotient. AI Studio - Empowering Businesses The AI Studio Platform is equipped to enable businesses to build AI chatbots for messaging services on Facebook, Instagram, and Messenger. Also, Meta will provide a sandbox tool in the upcoming year for users to experiment with creating their own AI. Generative AI Stickers - A New Co-creating Experience AI editing tools will allow users to edit images and co-create content with friends. The tool uses Llama 2 and the new image generation model, Emu, to convert text prompts into stickers in seconds. Ray-Ban Smart Glasses with Meta AI The Ray-Ban smart glasses are equipped with Meta AI, allowing users to receive information, incite creativity, and manage the glasses using just their voice. (source) P.S. If you like this kind of analysis, I write a free newsletter with the latest and most impactful news in AI. Professionals from Google, Meta, and OpenAI read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Get a job as a Prompt Engineer - Challenge: generate SAT-Style Multiple Choice Questions.
    One member on r/PromptWizards just applied for a job as a Prompt Engineer in a company, and they tasked him to craft a prompt system that generates high-quality SAT-style multiple-choice questions. Quite a quest, right? Well, stick around, and we'll take a deep dive into the prompt engineering we used to help him. The mission was precise: Write a prompt to yield an SAT-style multiple-choice question that rigorously tests a student's understanding of "Algebraically solving systems of 2 linear equations in 2 variables". The challenge didn't end there; the question produced had to meet the hard/difficult mark set by real SAT questions. Using the OpenAI Playground, we conducted incisive iterations, testing each prompt separately to mitigate any bias from prior outputs. Our approach was: - …  ( 11 min )
    Warner on AI regulation: ‘We probably can't solve it all at once’
    submitted by /u/smo279 [link] [comments]  ( 9 min )
    Courses for more Seniors
    Hello all, What course would you recommend for those of us who are older and already settled in other careers. For example I'm 35 and a manager so I wouldn't need a course to actually design AI or anything. It would be more related to understanding how/where to implement it in an organisation. Any tips? Cheers and merci submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Show-1: Marrying Pixel and Latent Diffusion Models for Efficient and High-Quality Text-to-Video Generation
    A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid approach combining both, and shared the results in a paper published yesterday. My highlights from the paper: Pixel diffusion excels at low-res video generation precisely aligned with text Latent diffusion acts as efficient upsampling expert from low to high res Chaining the two techniques inherits benefits of both Show-1 achieves strong alignment, quality, and 15x less inference memory The key is using pixel diffusion for the initial low-resolution stage. This retains alignment with text descriptions. Latent diffusion then serves as a super-resolution expert, upsampling efficiently while preserving fidelity. By blending complementary techniques, Show-1 moves past tradeoffs limiting the individual models. More details here. Paper is here (includes links to example generations). submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    What AI makes images that subtle forms a word like this one?
    submitted by /u/samuraiogc [link] [comments]  ( 9 min )
    Getting emotional with LLMs can increase performance by 115% (Case Study)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Question about a small project
    Me and my sister have a small project we are thinking about working on. The idea is basically that we are going to enter the same prompt, separately, into an image generating a.i of some sort (Dalle2 etc) for a period of time and hopefully see the result change. We would probly pick words or frases that are topical and debated. This only works though if the a.i isnt just trained on old data and has active connection to the internet. MY question is therefor, is there an a.i right now that would fit the task? Sorry if the question is dumb or i didnt explain myself clearly! submitted by /u/Mejwynn [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/27/2023
    ODIN integrates Large Language Models (LLMs) into Obsidian using LangChain, allowing you to ask questions about the data stored in your knowledge graph right from the prompt bar.[1] ChatGPT users can now browse internet, OpenAI says.[2] Adobe’s Photoshop on the web launch includes its popular desktop AI tools.[3] The White House plans to introduce a highly anticipated executive order in the coming weeks dealing with artificial intelligence, President Joe Biden said Wednesday.[4] Sources: [1] https://github.com/memgraph/odin [2] https://www.reuters.com/technology/openai-says-chatgpt-can-now-browse-internet-2023-09-27/ [3] https://www.theverge.com/2023/9/27/23892889/adobe-photoshop-for-the-web-firefly-ai-generative-fill-full-release-price-date [4] https://www.cnn.com/2023/09/27/tech/joe-biden-executive-order-artificial-intelligence/index.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Multimodal AI's will cause people to embrace their own reality bubbles and that is bad news for dictatorships
    I have been messing with llama. Trying to make a script to make a movie. Sort of realize it is not there yet, it can write decoherent long stories / what ever you want. You can couple it with stable diffusion to make images that would have to be described better to fit the "movie" or narrative. It is not there yet, ChatGPT can already do this, you can ask it to tell you a story and describe the visual scenes. At the same time, we are getting audio generation from things like audioldm2 and stableaudio etc. Multimodal AI's are almost here. Pretty soon we will have devices in our pockets powered by AI chips that will be able to generate what ever reality we want. We can feed them images from our past and they can allow us to live in VR reality of the past. Or we can choose to live in anot…  ( 10 min )
    Jazz Fusion (AI Generated DnB & Jazz music and video)
    submitted by /u/LibeerCZ [link] [comments]  ( 9 min )
  • Open

    Modern reinforcement learning for video game NPCs
    submitted by /u/akliyen [link] [comments]  ( 9 min )
    Reinforcement learning in automating game testing
    The role of Reinforcement learning in automating game testing is becoming increasingly crucial, making it more efficient and effective. Manual testing, while essential, is extremely time-consuming and subject to human error. Our opensource library SheepRL 🐑 can be used to test whether the game dynamics is well defined: what if a player can finish the game with just a few moves? 🎮 This video shows that our agent (Kasumi, on the left) is able to win the game in the hardest modality by standing down and throwing kicks. 🥊 This can be helpful for a game developer to: ​ understand where and how intervene to achieve a more playful game predict and correct bugs early in the game development process enhance the gaming experience and final product quality reduce time and resources spent on debugging. The game has changed 🔥 and it is up to us to play it with (human + artificial) intelligence! Thanks to u/DIAMBRA_AIArena for the video! --- ❌ Are you interested in joining the project community? Get in touch ❌ SheepRL 🐑 is open-source, fully written in PyTorch and accelerated with LightningFabric - by Lightning AI Feel free to use it for your Artificial Intelligence projects, and if you want to contribute, we are more than happy to accept your pull requests! ❤️ https://reddit.com/link/16uht6v/video/ve3derxsc0rb1/player submitted by /u/Manu_Orobix [link] [comments]  ( 9 min )
    Proofs in the original Q-Learning technical notes
    I'm not sure it's the right place for this, but I was going through the proofs in the "original" 1992 technical notes of Q-learning, and a couple of points raised some questions: 1) In the Proof of lemma B.4: https://preview.redd.it/7g6pputdqwqb1.png?width=1006&format=png&auto=webp&s=fe4afeac3b06deee6c80105b280a0085bdcfbe51 where do P_{xy}^2(a_2) and R_x(a_2) come from? If we apply the definitions of Q'(x, a_1, a_2) and Q(x, a_1, a_2) to get the bound, P_{xy}^2(a_2) and R_x(a_2) should not be there. Are they just notation errors or is it correct and I'm missing something? ​ 2) I don't quite get how the bounds on P and R are computed in Section 3.2: https://preview.redd.it/p06ysjewqwqb1.png?width=962&format=png&auto=webp&s=a5929e701099dc6e4543efe7681f96f12f543fa8 Considering the results in B.4 (i.e., the bounds for the distance between the chain's P, R and the real ones), I don't understand how they arrive at this conclusion. ​ I'd greatly appreciate any intuitions about these, or if someone can point me in the right direction :) submitted by /u/Beautiful_Zebra_198 [link] [comments]  ( 9 min )
  • Open

    [N] We Collaborated with Outerbounds to Enable HPC and Ray Integration in Metaflow
    Here is our blog post - please check it out: https://forums.autodesk.com/t5/engineering-hub-blog/autodesk-and-outerbounds-partner-to-open-source-ray-and-hpc/ba-p/12254816 And try out the metaflow-ray extension here: https://github.com/outerbounds/metaflow-ray submitted by /u/rirhun [link] [comments]  ( 9 min )
    [D] What are the options for the most human TTS?
    So I’ve been using elevenlabs but it burns through characters really fast. What are the best options for the most human sounding TTS available? I’ve been looking into tortoise, but would like to see if there are other options I should be looking into. submitted by /u/Long8D [link] [comments]  ( 9 min )
    [D] How do we know Closed source released benchmarks aren't being heavily optimized, through outside means?
    I've recently started working with ML and NLP, so I'm sorry if this sounds Naive. Unlike Llama 2 or other open source, we don't have access to the model weights for GPT-4, Claude or Bard, so Benchmark Evals are being run through either APIs or the chat Interface. So how do we know that the model isn't being Boosted by custom web-searching abilities or RAG? While GPT-4 might have a turnoff option, I'm pretty sure Bard is always online, being built by google. So how do we trust benchmarks? Also, have any opensource been tested after Websearch/RAG? submitted by /u/vatsadev [link] [comments]  ( 9 min )
    [R] Searching for a regression dataset with structure in its prediction
    I am searching for a relatively simple dataset to train a regressor that has some structure in its predictions. Can't be too tiny cause I have to try out a NN architecture. It must have at least some continuous feature but can also have additional categorical or related discrete structures. I usually work with vision tasks, so I am not sure if I miss something obvious I could try? Open for ideas! I thoughts about predicting rows in some tabular dataset? Anything suitable that comes to mind? submitted by /u/LeanderKu [link] [comments]  ( 9 min )
    [N] CUDA Architect and Cofounder of MLPerf: AMD's ROCM has achieved software parity with CUDA
    Greg Diamos, the CTO of startup Lamini, was an early CUDA architect at NVIDIA and later cofounded MLPerf. He asserts that AMD's ROCM has "achieved software parity" with CUDA for LLMs. Lamini, focused on tuning LLM's for corporate and institutional users, has decided to go all-in with AMD Instict GPU's. https://www.crn.com/news/components-peripherals/llm-startup-embraces-amd-gpus-says-rocm-has-parity-with-nvidia-s-cuda-platform submitted by /u/makmanred [link] [comments]  ( 9 min )
    [P] Request to test Mirage: A platform to search and generate images, videos, audio, and 3D assets using natural language
    Mirage is the infinite asset library that helps you find or create the perfect digital asset. 🗨️ Just Search Naturally: No awkward keywords—Mirage understands you. 🤖 State-of-the-Art Models: Can't find it? Generate it, thanks to open-source models. 🔍 Similarity Search: Discover more of what you love with a single click. 🤗 Fully Personalized: Our AI librarian learns your style to show you assets you'll dig. Website Link: MirageML Open-Source Github: Github Development Status: Beta I would love to get some honest feedback! submitted by /u/perception-eng [link] [comments]  ( 9 min )
    [P] Request to test Domeis: A new platform for no-code Machine Learning
    Domeis is a no-code Machine Learning platform that offers a dashboard to design, train and test Machine Learning algorithms, as well as to import, pre-process and cleanse data, all from the Graphical User Interface and without writing a single line of code. The aim of this platform is two-fold: Making Machine Learning accessible to anyone and not just Data Scientists or experienced software developers. By offering the possibility to design, train and test Machine Learning models directly via GUI, being an experienced software developer is no longer a pre-condition for creating ML models Making Machine Learning model creation, training and testing faster for experienced Data Scientists / Machine Learning Engineers. By drastically reducing the time needed to set up environments, import data and define models, Domeis allows Machine Learning practitioners to focus on trying out and compare different models/approaches. Website Link: https://www.domeis.it/ Development Status: Alpha I would love to get some honest feedback! submitted by /u/Ok_Hold_5385 [link] [comments]  ( 9 min )
    [D] Help understanding convergence proof (Adaptive learning rate + Momentum)
    Hello everyone, I am trying to understand the convergence analysis/derivation of the momentum algorithm, or the stochastic heavy ball algorithm, using the regret bound analysis from different research papers. https://ieeexplore.ieee.org/document/7330562 - Page3 https://www.mdpi.com/2504-3110/6/12/709 - Page6 http://arxiv.org/abs/1707.01647 - Page4 ​ In the derivation, there is the following simplification, which I do not understand at all ​ $\frac{2\boldsymbol{\eta}_{k}}{(1-\beta)}\sum_{k=0}^{T}\left|J(\theta_k) - J(\theta^*)\right| + \frac{2\boldsymbol{\eta}_{k}\beta}{(1-\beta)^2} \sum_{k=0}^{T}\left|J(\theta_k) - J(\theta_{k-1})\right| \leq \ \left|\boldsymbol{\theta}_{0} + \boldsymbol{p}_{0} - \boldsymbol{\theta}^* \right|^2 - \left|\boldsymbol{\theta}_{T+1} + \boldsymbol…  ( 9 min )
    [D]Any researchers or institutions in USA that follows Ai-compression relationships specifically like deepmind
    I have tried to follow the main collaborators of Hutter and other prominent scientists to track this, but they are mostly in Europe with some in Australia. American institutions seems to be more interested in more open ai like deep neural networks. If anyone is familiar with any US based institutions that does notable work in this line,please comment submitted by /u/Netero1999 [link] [comments]  ( 9 min )
    [R] Brain Tumor segmentation
    Can any of you suggest me computer science research ideas related to brain tumor segmentation using UNet. submitted by /u/Eleonora467 [link] [comments]  ( 9 min )
    [P] BionicGPT - ChatGPT replacement that let's you run R.A.G on confidential data
    BionicGPT is an open source WebUI that gives enterprises the ability to run Retrieval Augmented Generation (RAG) on their on premise documents. To allow people to get up to speed we deploy with a quantized 7B model that runs on CPU. Github Repo: https://github.com/purton-tech/bionicgpt We basically implement a RAG pipeline including document upload, embeddings generation and subsequent retrieval. Feedback: We'd love to get some feedback in the form or github issues or comments here. Screenshot: https://preview.redd.it/uiw0wqul30rb1.png?width=2447&format=png&auto=webp&s=8ad7e61ed048258c19aa63bf7c94d12da5b721fa submitted by /u/purton_i [link] [comments]  ( 9 min )
    [N] First Impressions with GPT-4V(ision)
    My colleague Piotr and I have been testing GPT-4V(ision) over the last day. We wrote up our findings, covering how GPT-4V performs on: Visual question answering (VQA) across a range of domains (locations, movies, plants) OCR Math OCR Object detection And more TL;DR: GPT-4V performed well for VQA and document OCR but struggled with OCR on real-world images and object detection (where we asked for bounding boxes). https://blog.roboflow.com/gpt-4-vision/ I would love to hear what other people have found working with GPT-4V. submitted by /u/zerojames_ [link] [comments]  ( 9 min )
    [R] NUS: Results of Combining Pixel and Latent Diffusion Models for Text-to-Video Generation
    A new paper proposes Show-1, a hybrid model that combines pixel and latent diffusion for efficient high-quality text-to-video generation. Both of these approaches have tradeoffs, so researchers at the National University of Singapore tried a hybrid approach combining both, and shared the results in a paper published yesterday. My highlights from the paper: Pixel diffusion excels at low-res video generation precisely aligned with text Latent diffusion acts as efficient upsampling expert from low to high res Chaining the two techniques inherits benefits of both Show-1 achieves strong alignment, quality, and 15x less inference memory The key is using pixel diffusion for the initial low-resolution stage. This retains alignment with text descriptions. Latent diffusion then serves as a super-resolution expert, upsampling efficiently while preserving fidelity. By blending complementary techniques, Show-1 moves past tradeoffs limiting the individual models. More details here. Paper is here (includes links to example generations). submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Linear Regression Queries [D]
    I am a beginner in Data Science. I have recently enrolled in the supervised machine learning algorithm by Andrew Ng in Coursera. I am now familiarized with linear regression, gradient descent. However, I faced a certain issue. In the optional lab, there was a task to calculate the value of the cost function using gradient descent for linear regression. I wrote the code in my notebook by myself and cross checked it to be correct. However, the desired output of w,b are very much different but the cost function yields a better result in my code. Another factor, I noticed that have to scale only thex variables, leaving the values of y. I have two major queries now: Is the yielding of different w,b values fine as long as the cost function is minimum? (w is a numpy array) Why do scale the x variables only? Why don't scale the y variables? Thanks in advance. submitted by /u/healing_you [link] [comments]  ( 9 min )
    [P] Hands-on open-source workflows for voice AI
    Hey r/MachineLearning, we made a tutorial that showcases typical workflows and tooling for voice analytics applications. The tutorial is intended for intermediate-level ML practitioners. The walkthrough is purely based on open source software and covers: Efficient interactive data exploration and inspection Dataset handling and inference on pre-trained models Model debugging and identification of critical data clusters Model comparison and selection ​ https://i.redd.it/j15gk3kkgyqb1.gif 🔗 Blog with code: https://medium.com/p/dbfd923a5a79#432e-3559ae606f80 🤗 Interactive demo: https://huggingface.co/spaces/renumics/emodb-model-debugging ​ ​ submitted by /u/44sps [link] [comments]  ( 9 min )
    [D] CV annotations and work with COCO/YOLO dataset
    Hi everyone. In my job I work with a lot of data for Computer Vision, and I use Label Studio for annotations. But the last time I've worked with it, I lost some of my annotations, which I need for other purposes. I have the final result as a YOLO and COCO dataset, but I cannot import the results from them to recover all I need. Can you suggest me good applications with an intuitive UI to import the COCO or YOLO dataset and work with labels? submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [P] Request to Test PyMilo: A New Python Library for Machine Learning I/O
    Pymilo is an open-source Python package that offers an efficient, safe, and transparent method for transporting pre-trained machine-learning models. The motivation for developing this package is to eliminate the risks of binary or pickle formats. As this library is still in its early stages of development, it currently supports only a limited number of machine learning models provided by Scikit-learn. Nevertheless, it will be precious if the community utilizes this library and provides us with their feedback about improving the package's interface and prioritizing future developments. Your cooperation would be invaluable to us. In the following, I provide an example of how to utilize PyMilo. GitHub Repo: https://github.com/openscilab/pymilo Development Status: Alpha Simple Linear Mode…  ( 9 min )
    [Discussion] Interesting interview question
    Was asked something similar to the following question in an interview for a ML role and was curious how others would answer this: Say you have a dataset with one feature column and one label column (with different classes). Assume this data is too large to fit into memory and could be infinite in size (e.g data is coming in as a stream). How would you train a ML model on this data to accurately predict the label? Followup: instead of one feature column, what if you had several thousand? How would you decide which features to use given the size of the dataset? I discussed online sampling (resevoir sampling, etc) as a way to get a training dataset that could fit in memory + continually train on that but the interviewer did not seem convinced. Any thoughts? submitted by /u/scpdstudent [link] [comments]  ( 9 min )
    [D] What appropriate loss function to use for "Search recall" optimization?
    I'm studying the application of ML to improve searches. Here's a couple of example scenarios: Document retrieval (search) system: We have a (source) document with us and we're trying to find a matching document in a database. The source document has text and image attributes - for simplicity let's say a title and a single image. Each search result will also be a document - with a title and at most one image. A search engine: We have a query comprised of both text and an image (like google image search allows text to be added to the query as well). Each search result will be a website with text and image attributes (for simplicity, webpage title and at most one image) More generally, I have a search system - whatever we're trying to search for has text and an image associated with it…  ( 11 min )
    [D] How Does Your Organization Approach Machine Learning Projects Phase by Phase?
    How does the development process of a Machine Learning project unfold phase-by-phase within your organization? Could you please specify the type of organization you are, the time spent on each phase, as well as any aspects you consider to be weak or fundamental? It would also be great if you could share any tips or tricks you've learned that have changed your perspective. submitted by /u/Spiritual_Narwhal649 [link] [comments]  ( 9 min )
    [P] Rubik's Cube Square Detection
    Hello everyone, I am trying to detect the 9 squares of a face of a Rubik’s Cube through a camera. The idea is that I want to use my computer camera and tell the user to show all the Rubik’s Cube faces and read the faces so I can feed it to a solver. Here are the steps I have tried so far: Sharpened square edges Obtained binary image and removed noise Detected and extracted squares Some methods I used were using different blurs and cv functions but nothing worked. Sometimes, it can get all 9 squares but sometimes it doesn't. There also seems to be a difference for different colors; for example; the model can detect green squares easier than yellow squares. Can anyone provide advice as to how I can detect the squares on the face? ​ https://preview.redd.it/1ht9f4h31wqb1.png?width=2180&format=png&auto=webp&s=32d23515a43406c0f8828e6790ad71e754b0ab80 submitted by /u/uglyboi34 [link] [comments]  ( 9 min )
  • Open

    DynIBaR: Space-time view synthesis from videos of dynamic scenes
    Posted by Zhengqi Li and Noah Snavely, Research Scientists, Google Research A mobile phone’s camera is a powerful tool for capturing everyday moments. However, capturing a dynamic scene using a single camera is fundamentally limited. For instance, if we wanted to adjust the camera motion or timing of a recorded video (e.g., to freeze time while sweeping the camera around to highlight a dramatic moment), we would typically need an expensive Hollywood setup with a synchronized camera rig. Would it be possible to achieve similar effects solely from a video captured using a mobile phone’s camera, without a Hollywood budget? In “DynIBaR: Neural Dynamic Image-Based Rendering”, a best paper honorable mention at CVPR 2023, we describe a new method that generates photorealistic free-viewp…  ( 92 min )
    Re-weighted gradient descent via distributionally robust optimization
    Ramnath Kumar, Pre-Doctoral Researcher, and Arun Sai Suggala, Research Scientist, Google Research Deep neural networks (DNNs) have become essential for solving a wide range of tasks, from standard supervised learning (image classification using ViT) to meta-learning. The most commonly-used paradigm for learning DNNs is empirical risk minimization (ERM), which aims to identify a network that minimizes the average loss on training data points. Several algorithms, including stochastic gradient descent (SGD), Adam, and Adagrad, have been proposed for solving ERM. However, a drawback of ERM is that it weights all the samples equally, often ignoring the rare and more difficult samples, and focusing on the easier and abundant samples. This leads to suboptimal performance on unseen data, espe…  ( 92 min )
  • Open

    Accenture creates a Knowledge Assist solution using generative AI services on AWS
    This post is co-written with Ilan Geller and Shuyu Yang from Accenture. Enterprises today face major challenges when it comes to using their information and knowledge bases for both internal and external business operations. With constantly evolving operations, processes, policies, and compliance requirements, it can be extremely difficult for employees and customers to stay up […]  ( 8 min )
    Speed up your time series forecasting by up to 50 percent with Amazon SageMaker Canvas UI and AutoML APIs
    We’re excited to announce that Amazon SageMaker Canvas now offers a quicker and more user-friendly way to create machine learning models for time-series forecasting. SageMaker Canvas is a visual point-and-click service that enables business analysts to generate accurate machine learning (ML) models without requiring any machine learning experience or having to write a single line of code. SageMaker […]  ( 7 min )
    Robust time series forecasting with MLOps on Amazon SageMaker
    In the world of data-driven decision-making, time series forecasting is key in enabling businesses to use historical data patterns to anticipate future outcomes. Whether you are working in asset risk management, trading, weather prediction, energy demand forecasting, vital sign monitoring, or traffic analysis, the ability to forecast accurately is crucial for success. In these applications, […]  ( 10 min )
    Create a Generative AI Gateway to allow secure and compliant consumption of foundation models
    In the rapidly evolving world of AI and machine learning (ML), foundation models (FMs) have shown tremendous potential for driving innovation and unlocking new use cases. However, as organizations increasingly harness the power of FMs, concerns surrounding data privacy, security, added cost, and compliance have become paramount. Regulated and compliance-oriented industries, such as financial services, […]  ( 13 min )
    Beyond forecasting: The delicate balance of serving customers and growing your business
    Companies use time series forecasting to make core planning decisions that help them navigate through uncertain futures. This post is meant to address supply chain stakeholders, who share a common need of determining how many finished goods are needed over a mixed variety of planning time horizons. In addition to planning how many units of […]  ( 11 min )
    Announcing New Tools to Help Every Business Embrace Generative AI
    From startups to enterprises, organizations of all sizes are getting started with generative AI. They want to capitalize on generative AI and translate the momentum from betas, prototypes, and demos into real-world productivity gains and innovations. But what do organizations need to bring generative AI into the enterprise and make it real? When we talk […]  ( 13 min )
  • Open

    How will the Big Data market evolve in the future?
    Big data has been around for some time now, becoming a more or less common concept in business. However, recent developments in AI technology have shaken up an already volatile field, inviting us to reconsider our projections of how the big data market will look in the future. We can already see the signs that… Read More »How will the Big Data market evolve in the future? The post How will the Big Data market evolve in the future? appeared first on Data Science Central.  ( 22 min )
  • Open

    Kicking Games Up a Notch: Startup Sports Vision AI to Broadcast Athletics Across the Globe
    Pixellot is scoring with vision AI — making it easier for organizations to deliver real-time sports broadcasting and analytics to viewers across the globe. A member of the NVIDIA Metropolis vision AI partner ecosystem, the company based near Tel Aviv offers an AI-powered platform that automates the capturing, streaming and analysis of sporting events. It’s Read article >  ( 7 min )
    V for Victory: ‘Cyberpunk 2077: Phantom Liberty’ Comes to GeForce NOW
    The wait is over. GeForce NOW Ultimate members can experience Cyberpunk 2077: Phantom Liberty on GOG.com at full GeForce RTX 4080 quality, with support for NVIDIA DLSS 3.5 technology. It’s part of an action-packed GFN Thursday, with 26 more games joining the cloud gaming platform’s library, including Quake II from id Software. A New Look Read article >  ( 8 min )
  • Open

    AI Frontiers: Measuring and mitigating harms with Hanna Wallach
    Powerful large-scale AI models like GPT-4 are showing dramatic improvements in reasoning, problem-solving, and language capabilities. This marks a phase change for artificial intelligence—and a signal of accelerating progress to come.    In this Microsoft Research Podcast series, AI scientist and engineer Ashley Llorens hosts conversations with his collaborators and colleagues about what these models—and the […] The post AI Frontiers: Measuring and mitigating harms with Hanna Wallach appeared first on Microsoft Research.  ( 29 min )

  • Open

    How can AI recreate the lack of information?
    Hey there! Are there guys here who possess a strong grasp of AI neural network logic? ​ I've extracted a character from an anime scene using a mask, and saved it as a PNG sequence which contains solely the anime character along with an alpha (transparent) background. ​ I'm curious about how the Flowframes neural network can recreate the background that was originally behind the character but removed by the mask. It's impossible since the PNG images don't have that background info. ​ Can anyone explain how this works? ​ Attachments: - Image #1: https://preview.redd.it/z2bypfkstvqb1.png?width=1920&format=png&auto=webp&s=c534167c5ae4129c04f9b8b2fbca3bac350a1d4a - Image #2: https://preview.redd.it/x5kkzs2ttvqb1.png?width=1920&format=png&auto=webp&s=6838d7ca5e1e4f19ba46c04750fdaea537a787f0 (Don't mind the black background in the thumbnails, it's a bug, there's actually a transparent background) ​ * Flowframes is a app that utilizes advanced AI frameworks to interpolate videos in order to increase their framerate in the most natural looking way possible. submitted by /u/drkysqrl [link] [comments]  ( 9 min )
    (Pt. 2) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Graph Feature vector (embedding) [D]
    Hey all, I’m trying to do a regression algorithm for a dataset where I have a graph for each patient I have representing a location in their brain from MRI images. Right now, I don’t have a ton of data, so I’m looking for some way to take each graph I have and get a feature vector for it to put into a regression algorithm. So for 100 patients, I have 100 graphs, I’d like to have 100 feature vectors representing each patients graph. My issue is trying to find some algorithm that takes in the entire graph and outputs a single feature vector. I’ve been looking at some libraries but they all seem wildly scattered. I don’t want to grab a bunch of nose embeddings and do some elementary merge of them, like an average or sum, etc. Any help in pointing me to some Python libraries that can help me do this, or algorithms, or anything. Thank you so much. submitted by /u/kaleb7589 [link] [comments]  ( 9 min )
    Normalization in VAE[D]
    Normalization in VAE[D] Am training a variational auto encoder. First I tried with batch normalization before I send the data to the network and someone probably wisely pointed out that it's not correct. If I don't use batch norm then my training fails due to numerical instability. I then tried scaling my data before hand using standard scaler from sklewrn. And now my training works. Is this reasonable? Any other thoughts? submitted by /u/Global-Gene2392 [link] [comments]  ( 9 min )
    [P] Predicted stock data with TensorFlow is very different from actual data
    I'm following a YouTube video to create a simple machine learning model to predict stock prices. I have to reshape my prediction data so it works with inverse_transform, but in the video he doesn't do this. If I don't reshape it I get an error, but I think when I do reshape it it messes with the data. The predicted values are all very similar. I've tried messing with epoch and batch sizes, and changing other metrics like prediction_days, but nothing has worked. This is what the prediction data looks like when plotted, and this is what it looks like when printed. Does anyone know what could be causing this? Here's my code submitted by /u/darkshadowtrail [link] [comments]  ( 9 min )
    New subreddit rule idea [D]
    This subreddit will continue to die if it doesn't foster discussion of the latest research and reduce low-quality posts. However, making a judgement as to what is or is not low-quality is time-consuming and subjective -- not something the mods should be doing. To this end, I had the following new rule idea: If it's your first time at Fight If it's your first post in this subreddit, it needs to be a link to arxiv (Or, more generally, the number of your non-arxiv posts cannot exceed the number of your arxiv posts) All arxiv posts must be standard links to the abstract page (to catch reposts and to connect discussions of the same paper in different subreddits) An arxiv post must be a paper you've read yourself, and you should post a comment describing what you liked and DIDN'T like about it (Let the airing of grievances begin! I think this will help seed the discussion, which is really the raison d'être of this subreddit) If the post or the comment get downvoted, they do not count. What do you think? Will this help steer this subreddit in the right direction? Is this enforceable? submitted by /u/we_are_mammals [link] [comments]  ( 9 min )
    [D] How feasible is it to complete a course.
    Hi I am a physicist (1st year in masters) and I decided to take NN class (for cs students). I have a decent experience with python but I have never done low level coding. The class project requires a C++ implementation of NN with back propagation algorithm. I am quite confident in my learning ability, nonetheless, do you guys think it is feasible for me to code such a project in 13 weeks (I also have other subjects and cant just spend all my time on this)? Thanks submitted by /u/merimace [link] [comments]  ( 9 min )
    [P][D] Need Guidance on Building a Chatbot like ChatGPT for University Data - Help!
    Hey fellow Redditors, I find myself in quite a situation and could use some guidance. Recently, I introduced my professor to privateGPT and demonstrated its capabilities using a small set of college data. To my delight, he was impressed and has now tasked me with researching and developing a ChatGPT-like chatbot, but with access to our university's extensive data. Here's where I need your help: my professor wants this chatbot to be hosted on our university's systems due to privacy concerns, which means I can't use ChatGPT's API. I've been given access to Sol HPC, but I'm finding it quite confusing to get started. I'm looking for advice, tips, or any resources that can help me embark on this journey. Has anyone here worked on a similar project, or does anyone have experience with Sol HPC or building chatbots with local data sources? Any guidance or insights would be greatly appreciated! Thank you in advance for your help. This project means a lot to me, and I want to make sure I'm heading in the right direction. submitted by /u/ssshankyyy [link] [comments]  ( 9 min )
    [R] UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text
    Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A new paper from UNC Chapel Hill proposes VIDEODIRECTORGPT, a two-stage framework attempting to address multi-scene video generation: Here are my highlights from the paper: Two-stage approach: first a language model generates detailed "video plan", then a video generation module renders scenes based on the plan Video plan contains multi-scene descriptions, entities/layouts, backgrounds, consistency groupings - guides downstream video generation Video generation module called Layout2Vid trained on images, adds spatial layout control and cross-scene consistency to existing text-to-video model Experiments show improved object layout/control in single-scene videos vs baselines Multi-scene videos display higher object consistency across scenes compared to baselines Competitive open-domain video generation performance maintained The key innovation seems to be using a large language model to plot detailed video plans to guide overall video generation. And the video generator Layout2Vid adds better spatial and temporal control through some clever tweaks. The separation of these tasks seems to matter. You can read my full summary here. There's a link to the repo there too. Paper link is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    survival analysis in matlab [project]
    survival analysis in matlab hi everyone one i'm doing a predictive algorithm to find DFS using Cox regression, i first used LASSO regression to select the predictive variables, now i'm using the c-index to evaluate the predictive accuracy, and it's always equals to 1 and I can't understand why(I tried to reduce the numbers of variables just to see if it could change but it didn't change).Also, i'm working on censored date of course. can someone help me understand what I'm doing wrong? submitted by /u/bl4s3159 [link] [comments]  ( 9 min )
    [R] Can you help me validate my kmeans calculator for tensorflow faster rcnn model config ?
    My annotations are in pascal voc format. Below is a calculator I am testing . Not sure if I am calculating the scale and aspect ratios correctly. Please help. import os import xml.etree.ElementTree as ET import numpy as np from sklearn.cluster import KMeans def compute_scales_and_aspect_ratios(directory, n_clusters, img_size): widths = [] heights = [] for filename in os.listdir(directory): if not filename.endswith('.xml'): continue fullname = os.path.join(directory, filename) tree = ET.parse(fullname) root = tree.getroot() for obj in root.iter('object'): xmlbox = obj.find('bndbox') w = float(xmlbox.find('xmax').text) - float(xmlbox.find('xmin').text) h = float(xmlbox.find('ymax').text) - float(xmlbox.find('ymin').text) widths.append(w) heights.append(h) widths = np.array(widths) / img_size[1] # Normalize by image width heights = np.array(heights) / img_size[0] # Normalize by image height scales = np.sqrt(widths * heights).reshape(-1, 1) aspect_ratios = (widths / heights).reshape(-1, 1) kmeans_scales = KMeans(n_clusters=n_clusters, random_state=0).fit(scales) kmeans_aspect_ratios = KMeans(n_clusters=n_clusters, random_state=0).fit(aspect_ratios) return kmeans_scales.cluster_centers_, kmeans_aspect_ratios.cluster_centers_ directory = "path_to_top_folder/batch-1" n_clusters = 5 img_size = (640, 1024) scales, aspect_ratios = compute_scales_and_aspect_ratios(directory, n_clusters, img_size) print('Scales:', scales.flatten()) print('Aspect Ratios:', aspect_ratios.flatten()) ​ submitted by /u/dpadhy [link] [comments]  ( 9 min )
    [P] Any available datasets of children’s books or stories?
    I am looking for training data consisting of children’s stories and associated grade level. Does anyone know of any publicly available or paid datasets like this? submitted by /u/SpellboundLRN [link] [comments]  ( 9 min )
    Embers of Autoregression: Understanding Large Language Models Through the Problem They are Trained to Solve
    submitted by /u/cegras [link] [comments]  ( 9 min )
    [P] Tetris AI - Suggestions on direction to take from here? (One hot encoded dataset with 200 features)
    Hello! I'm working on a Tetris AI and am representing the 10x20 grid cubes with a one hot encoded dataset: https://www.kaggle.com/datasets/conlan/tetris-training-set-9262023 This means my data has 208 features (200 for the grid cubes being on/off, 7 for the "next shape" box, and 1 for the labeled best move. I currently have 9460 labeled samples and have done some preliminary fitting here: https://www.kaggle.com/code/conlan/tetris-ai?scriptVersionId=144388350 with a highest f1_macro score of 0.431090. Does anyone have suggestions for which direction to take from here to improve? Currently I see: Collect More Data Tune Hyperparameters Rework Features I'm hesitant to rework the features as that would require telling the model more specifics and would like to keep it abstract, but maybe 200 is crazy high? Or maybe <10k samples is too low and I should just keep collecting data? Thanks in advance! submitted by /u/conlanrios [link] [comments]  ( 9 min )
    [R] The Internal State of an LLM Knows When its Lying
    Paper - https://arxiv.org/abs/2304.13734 submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [D] Feature Transformation & Scaling
    ood morning everyone, I am currently reading the book of Mr. Burkov: Machine Learning Engineering. He talk about a step that might be helpful before training a ML model: Feature Scaling. Furthermore, he adds that before Feature Scaling, you might do Feature Transformation (Log, Square,...) in order to make your data look normal and have better models. How true do you think this statement is? Do you also transform your feature, and then scale them? How often do you do it? It is important for Regression or SVM, but do you do it also for other black box algorithms such as Random Forests? What are the best practices according to you? submitted by /u/dekozr [link] [comments]  ( 9 min )
    AAAI 24 [Discussion]
    So no discussions are going on about AAAI 2024, or have I just been unable to find any? Opening this regarding Phase 1-2 and Results discussions if anyone wants to discuss. If there already is a thread, share! For an opening question, any idea about what percentages are rejected in desk rejection, phase 1 and finally phase 2? (Roughly of course) submitted by /u/atharvandogra [link] [comments]  ( 9 min )
    [D] GPT2 diagrams are wrong
    so if u go check the source code for gpt2 u can clearly see that the nrom happens inside the attention and mlp layers. and that the add is separate. this is in the official openai github and is relatively easy to read:https://github.com/openai/gpt-2/blob/master/src/model.py#L123-L130 (thx KingsmanVince) ​ for some reason all the online materials are saying that there is a full norm layer before the mlp instead of inside of it submitted by /u/rejectedlesbian [link] [comments]  ( 9 min )
    [D] ONNX or torchlib for on device training in C++
    Hi, Recently I am trying to reimplement a deep learning based object tracking in C++. However, the whole pipeline involve online training and weight update. Is it possible to do the training for ONNX model and using cuda as accelerator in C++? If yes, then how is the training speed (BP/update)compare to torchlib? I personally strongly prefer onnx, cuz it is much easier to deploy… submitted by /u/Independent_Bet1268 [link] [comments]  ( 9 min )
    [D] The quality of this sub
    Mods finally commented The only time that mods were active is when they removed the cat meme. It has been a month since that. Let see what mods have done to improve this sub. Here are some of the other posts obviously rule-breaking or off-topic that mods do NOT remove: A person asking for help with their motherboard A person asking about statistics A person asking for machine learning roadmap Another asking-for-roadmap post ... the list goes on with absolute beginner questions, and low-quality posts. All these posts were written in less than 1 week. As we can see, mods do nothing. They only remove posts that calling out them. Here are posts that people discuss the status of this sub: 17 Sep 2023 2 Sep 2023 1 Aug 2023 Questions for mods: where are you when people complain? Why do you only show up when someone call you out? We have few options: Report the mods and the sub for unmoderated (see this 1 and this 2) Find other communities Gatekeep harder, tell people to go to r/learnmachinelearning r/MLQuestions r/cscareerquestions r/languagetechnology submitted by /u/March8365 [link] [comments]  ( 9 min )
    [D] Model release v0.1 from MistralAI
    EDIT: They released the model weights on HF (https://huggingface.co/mistralai) under a Apache 2.0 License. They also updated their website with documentation on how to use/run : https://docs.mistral.ai Note: I am not affiliated with Mistral AI. ​ Via their Twitter X account : magnet:?xt=urn:btih:208b101a0f51514ecf285885a8b0f6fb1a1e4d7d&dn=mistral-7B-v0.1&tr=udp%3A%2F%http://2Ftracker.opentrackr.org%3A1337%2Fannounce&tr=https%3A%2F%http://2Ftracker1.520.jp%3A443%2Fannounce ​ https://preview.redd.it/0o46ls925rqb1.png?width=1306&format=png&auto=webp&s=7ff7ca3a510577e9ecdaa3c9ccb7ef763acc0780 submitted by /u/Fluid-Age-9266 [link] [comments]  ( 9 min )
    [D]Finetune t5 for classification but not seeing loss reduction
    I am wondering if any one runs into this before, i have finetuned flan-t5-xl for classification task by generating one token from decoder. The finetune process looks OK. I want to convert this into t5 encoder with a head to save memory. I am using huggingface T5ForSequenceClassification. However i am seeing loss not actually decrease but bounce around certain float value. What could be wrong? I have tried a few learning rates and other hyperparameters tuning. submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [R] Microsoft Researchers Propose DIT Morality Test for LLMs To Quantify AI Moral Reasoning Abilities
    Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score. In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios. Key findings: Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning. ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores. Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters. The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning. I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?). Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych). Full paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    Any good AI newsletters? I'm tired
    Any good AI (low-hype) newsletters/blogs? That's ideally sent <= 4 times a month? I'm tired of the amount of AI news I have to go through daily just to keep up. submitted by /u/onteri [link] [comments]  ( 9 min )
    AI is taking jobs away from Chinese streamers and online retailers
    AI-generated deepfake clones of Chinese livestream influencers are becoming popular on e-commerce platforms. These clones can work 24/7 and help brands sell their products without the need for human streamers. Chinese startups and tech companies are offering the service of creating these deepfake avatars for a cost of around $1,000. The technology has evolved over the years, with the need for training videos decreasing from 30 minutes to just one minute. The AI clones can mimic the movements and speech of human streamers, making them an affordable and efficient alternative for smaller brands. Source : https://www.technologyreview.com/2023/09/19/1079832/chinese-ecommerce-deepfakes-livestream-influencers-ai/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Using language models for code generation works better when limited to a specific domain
    Automatic code generation has always been an integral part of programming: compilers, synthesis tools, convertors, etc. are examples of classic code generators. Now, with such powerful LLMs at hand, it is only natural to try to find new ways to generate codes. The question is: are LLMs the right tool for code generation? There are two sides to code generation: (1) understanding the intent (a.k.a. capturing the spec) (2) writing the code. LLMs are great for (1), but not so good for (2). This is an example of using LLM for general-domain code generation: https://github.com/RoboCoachTechnologies/GPT-Synthesizer You can see that the main focus here is to properly capture the spec, and that's where LLMs shine. LLMs solution for a general-domain code generation may not be complete or optimized. It is always easier to break the problem and solve code generation in a specific domain. Here you can see how much better and cleaner the output of code generation can be when it is limited to a specific domain (robotics domain, ROS in particular, in this case): https://github.com/RoboCoachTechnologies/ROScribe What are your thoughts on using LLMs for code generation? submitted by /u/RoboCoachTech [link] [comments]  ( 9 min )
    How to stop AI deepfakes from sinking society — and science
    submitted by /u/waozen [link] [comments]  ( 9 min )
    Even the CIA is developing an AI chatbot
    The CIA is developing an AI chatbot similar to ChatGPT to help US intelligence agencies sift through large amounts of information. The program will train on publicly available data and provide sources for agents to confirm their validity. The tool will allow agents to look up information, ask follow-up questions, and summarize daunting masses of data. The exact nature of what constitutes 'public data' could spark privacy issues. The tool will be distributed to the 18-agency US intelligence community, but not to lawmakers or the public. Source : https://www.engadget.com/even-the-cia-is-developing-an-ai-chatbot-192358767.html submitted by /u/NuseAI [link] [comments]  ( 9 min )
    UNC Researchers Present VideoDirectorGPT: Using AI to Generate Multi-Scene Videos from Text
    Generating coherent videos spanning multiple scenes from text descriptions poses unique challenges for AI. While recent progress enables creating short clips, smoothly transitioning across diverse events and maintaining continuity remains difficult. A new paper from UNC Chapel Hill proposes VIDEODIRECTORGPT, a two-stage framework attempting to address multi-scene video generation: Here are my highlights from the paper: Two-stage approach: first a language model generates detailed "video plan", then a video generation module renders scenes based on the plan Video plan contains multi-scene descriptions, entities/layouts, backgrounds, consistency groupings - guides downstream video generation Video generation module called Layout2Vid trained on images, adds spatial layout control and cross-scene consistency to existing text-to-video model Experiments show improved object layout/control in single-scene videos vs baselines Multi-scene videos display higher object consistency across scenes compared to baselines Competitive open-domain video generation performance maintained The key innovation seems to be using a large language model to plot detailed video plans to guide overall video generation. And the video generator Layout2Vid adds better spatial and temporal control through some clever tweaks. The separation of these tasks seems to matter. You can read my full summary here. There's a link to the repo there too. Paper link is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Cyberpunk Multiverse
    I created this cyberpunk inspired short using Midjourney to create the pictures, RunwayML to animate them, and then edit them together using CapCut on iOS. I know the animation is still in early stages, but what do you think? Do you think we could have full length movies in a couple years? submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
    A Simple Checklist for Self-Evaluating Prompt Quality
    How do you evaluate the quality of your prompt outputs? Here's a handy checklist. Let's have a look! You can also join r/PromptWizards to find more tutorials and prompts! Part 1: Understanding AI's Understanding You've presented a prompt to your AI, the next questions are: Has the AI accurately grasped the context? If not, how can I make sure the LLM steers my context better, should I be more direct and clear in my prompt? Can I be less negative (shows to perform less) and be more guiding to the LLM? Do the responses directly address the question or topic? Was my query and task/instruction clearly detailed in enough depth that the LLM understood what I expect? Are there any contradictions between different responses to the same prompt? If I run my prompt multiple times, i…  ( 10 min )
    OpenAI’s GPT-4 With Vision Still Has Flaws, Reveals Paper
    OpenAI's much-touted model GPT-4, lauded for its multimodal abilities, including advanced image recognition, still has significant flaws. These glitches range from inventing facts to misinterpreting chemicals' images and hate symbols, according to a new paper from OpenAI. To stay ahead of AI developments, look here first. https://preview.redd.it/seg5x4zn3uqb1.png?width=1108&format=png&auto=webp&s=635a6c58cf6255f62d8eae3077678864e5b0e248 Unintended GPT-4V behaviors GPT-4V has a tendency to hallucinate or invent facts with unwarranted confidence. The model struggles to make correct inferences, sometimes creating fictional terms by wrongly combining text strings. It misinterprets certain symbols of hate and can give incorrect answers in the context of medical imaging. OpenAI’s mitigation strategies OpenAI has implemented various safeguards to prevent GPT-4V's misuse, such as breaking CAPTCHAs or using images to infer personal details. The company insisted that GPT-4V is not to be used for identifying dangerous chemicals from image structures. OpenAI acknowledged it has a long way to go in refining the model and is working on it. Discrimination and bias When OpenAI’s production safeguards are disabled, GPT-4V displays bias against certain sexes and body types. The paper reported offensive responses related to body positivity when prompted by an image of a woman in a bathing suit. (source) P.S. If you like this kind of analysis, I write a free newsletter that dissects the most impactful AI news and research. 1000s of professionals from Google, Meta, and OpenAI read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    New Bing browser, same Bing results. Score was 10-27 btw.
    submitted by /u/degrudv [link] [comments]  ( 9 min )
    Are language Models being nerfed?
    In using Ai and asking it to do simple tasks like "explain this in more simple terms" or asking it to make flashcards for me in a certain format, I am really convinced that language models, (bard and openai specifically) are being nerfed. They cannot understand simple instructions as well anymore. I had a paragraph of information for one of my classes that I wanted it to make more straightforward for me before I actually went to class the next day. I spent like 30 minutes trying to get it to do that and eventually just ended up giving up. Why dont language models feel as sharp as they did say a year ago? I wish I had more examples to share. Am I the only one who's noticed this? submitted by /u/Bojof12 [link] [comments]  ( 9 min )
    Looking For The Best AI Art Generator? Look No Further! (Definitive Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    Looking to change my own voice for audio production
    Hi all – I’m new to this sub-Reddit, so hopefully I’m in the right place. I am working on an audio production that will span multiple episodes and hopefully multiple seasons. It will require many characters, ranging in gender, age, ethnicity, etc. I am a decent voice actor and can do many of the roles myself, but some of them I cannot fake using my voice alone. My budget is very limited, so I was hoping to find some type of software I can change my voice for the production. This can be during the recording process, or after recording… As long as it gets the job done, and makes me sound like someone else entirely. Does anybody know of a software that can achieve this? Most of the software I found either specifically designed to change users voice on the spot and is aimed at gamers changing your voice for live streams or in-game chats. I’m also on a Mac, which I know will be limiting. I’m having a hard time finding something I can use. Any suggestions will be helpful. Thank you! EDIT: To clarify, I don’t want to just change my voice to sound different in general. I want to specifically sound like a woman, an elder man, or someone of a different ethnic background. Those are just a few examples. submitted by /u/nopetoocreepy [link] [comments]  ( 9 min )
    I asked AI to create a religion and this is what it created (its weird)
    submitted by /u/GABIBBOPAZZOCINESE [link] [comments]  ( 9 min )
    My uneducated opinion on where we are going with AI - video essay.
    submitted by /u/rattuspuer [link] [comments]  ( 9 min )
    This Article Was Written Half By A Human... And Half By AI.
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    CGPT-4, describe what the US would look like today had the insurrection suceeded.
    Predicting alternative historical scenarios is always fraught with complexity and uncertainty, especially concerning highly politically charged topics. However, let's entertain the hypothetical situation where the insurrection following the 2020 U.S. presidential election had succeeded, with the understanding that this is speculative reasoning. In this alternate reality, the immediate consequences would be a constitutional crisis of unprecedented proportions. Faith in democratic institutions would be severely eroded, both domestically and internationally. Trump's retaining power in this manner would spark widespread protests, perhaps more intense and larger than those seen in the summer of 2020. The unrest would likely lead to a governmental response that could be more authoritarian, poss…  ( 10 min )
    Everest.
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 9 min )
    AI art generator
    Hey, so I'm hoping to get a bit of help with finding an art generator to play around with, my only concern is the giving my entire personal information away, are there any apps for Android whereby the tos aren't crazy invasive by any chance? submitted by /u/Fluffy_Discount_9692 [link] [comments]  ( 9 min )
    Deepfake election risks trigger EU call for more generative AI safeguards
    The European Union is urging the implementation of more safeguards against the risks posed by generative AI tools to free and fair debate in democratic societies, especially during elections. The EU's values and transparency commissioner has highlighted the potential threat of AI-generated disinformation to elections and called for platforms to be vigilant and provide efficient safeguards. Mainstream platforms have made initial efforts to address the risks by implementing safeguards to inform users about the synthetic origin of content posted online. The EU commissioner is meeting with representatives from OpenAI to discuss the issue. An incoming pan-EU AI regulation, the EU AI Act, is expected to make user disclosures a legal requirement for generative AI technologies. The EU's voluntary anti-disinformation Code has 44 signatories, including major social media and search platforms, as well as entities from the ad industry and civil society organizations. Google, one of the signatories, has published a report discussing the social impacts of AI and its commitment to developing technology responsibly. Google Search has published guidance on AI-generated content and plans to integrate new innovations in watermarking, metadata, and other techniques into its generative models. The EU's Code of Practice on Disinformation is seen as a stop-gap measure until the EU AI Act is adopted and mandatory deepfake disclosures are enforced. Source : https://techcrunch.com/2023/09/26/generative-ai-disinformation-risks/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/26/2023
    Chinese media reported that BIDU’s Baidu AI Cloud has released ACE 3.0, an intelligent traffic solution comprehensively restructured using a foundation model. ACE means Autonomous Driving, Connected Road, and Efficient Mobility respectively.[1] BCG consultants solving business problems with OpenAI’s GPT-4 performed 23% worse than those without it, new study finds.[2] CIA Builds Its Own Artificial Intelligence Tool in Rivalry With China.[3] Facebook parent is developing bots with personalities, including a ‘sassmaster general’ robot that answers questions.[4] Sources: [1] http://www.aastocks.com/en/stocks/news/aafn-con/NOW.1296238/popular-news/AAFN [2] https://finance.yahoo.com/news/bcg-consultants-solving-business-problems-081532840.html [3] https://www.bloomberg.com/news/articles/2023-09-26/cia-builds-its-own-artificial-intelligence-tool-in-rivalry-with-china#xj4y7vzkg [4] https://www.wsj.com/tech/ai/meta-ai-chatbot-younger-users-dab6cb32 submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How do I turn images into landscapes?
    I was wonderhow someone made the destroyed building look like a cat. Anyone know how to do this? submitted by /u/Agitated-Court-2871 [link] [comments]  ( 9 min )
    Getting an A6000. What interesting things can I do with it?
    As title, I’ll be getting my hands on a couple of decent GPUs, including an old A6000, and am excited for everything its 48GB of VRAM unlocks. What’s something interesting I should do with it? A few things off the top of my head: See what crazy things stable diffusion generates at an insane resolution (how high of a resolution would 48GB allow?) Train good Dreambooth models (or what newer methods are there for style and object training?) Run and compare various open-source LLMs (should be able to run 70b models? Generate something of decent length with MusicGen Gaussian Splatting Distribute voice recognition, TTS, audio2face, LLM, and rendering across 2 or 3 machines to create a realistic virtual human (suggestions for excellent TTS would be appreciated) What other interesting models are out there to experiment with? submitted by /u/DsDman [link] [comments]  ( 9 min )
    Microsoft Researchers Propose AI Morality Test for LLMs in New Study
    Researchers from Microsoft have just proposed using a psychological assessment tool called the Defining Issues Test (DIT) to evaluate the moral reasoning capabilities of large language models (LLMs) like GPT-3, ChatGPT, etc. The DIT presents moral dilemmas and has subjects rate and rank the importance of various ethical considerations related to the dilemma. It allows quantifying the sophistication of moral thinking through a P-score. In this new paper, the researchers tested prominent LLMs with adapted DIT prompts containing AI-relevant moral scenarios. Key findings: Large models like GPT-3 failed to comprehend prompts and scored near random baseline in moral reasoning. ChatGPT, Text-davinci-003 and GPT-4 showed coherent moral reasoning with above-random P-scores. Surprisingly, the smaller 70B LlamaChat model outscored larger models in its P-score, demonstrating advanced ethics understanding is possible without massive parameters. The models operated mostly at intermediate conventional levels as per Kohlberg's moral development theory. No model exhibited highly mature moral reasoning. I think this is an interesting framework to evaluate and improve LLMs' moral intelligence before deploying them into sensitive real-world environments - to the extent that a model can be said to possess moral intelligence (or, seem to possess it?). Here's a link to my full summary with a lot more background on Kohlberg's model (had to read up on it since I didn't study psych). Full paper is here submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
  • Open

    Online Training from Demonstrations
    I would like to embark on online training for an F1TENTH racing car, starting from scratch and leveraging demonstration data. Currently, it appears that DDPGfD is a promising approach. Does anyone have any research papers they can recommend or suggestions on how to get started? submitted by /u/anointedninja [link] [comments]  ( 9 min )
    "What If the Robots Were Very Nice While They Took Over the World?" (reflections on CICERO & _Diplomacy_)
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Advice on getting started with a career in reinforcement learning
    Reinforcement learning has grabbed my interest pretty firmly and been my focus for the 3 months or so. I spend most of my time working in python , rust, and now mojo. Not an expert yet but my coding skills are improving. I have no degree and have taught myself most of what I know. That part is why Im looking for advice from you all. Practically every job post Ive seen has college requirements. Is it unlikely to get hired without a degree? Additional information: I'm currently working on projects for github but those arent quite done. My main interest is related to RL in game design. Applications of distributional RL in action dense environments and VR. Currently using Godot engine the most and have used pytorch, openai gym, and tensorflow (to a lesser degree). The abstract concepts of neural networks comes easy to me and Ive been following basic neurology as well. submitted by /u/SchrodingersCog [link] [comments]  ( 9 min )
    How to modify DQN to not overfit for action that concludes episode
    Edit: I may be jumping the gun here but I think I figured it out (looks good so far). I give the episode reward for every action EXCEPT the "end early" action, now I will need to give some boost for shorter episodes to achieve the desired effect :) I feel like I'm experiencing déjà vu, posing another DQN-related question. But, here's my issue: I've set up an environment where an agent can interact for 40 steps or choose to end the interaction early with a specific action. The catch is that the reward is only given at the end of the episode, which seems to be leading the agent to strongly favor the "end early" action. Despite all other steps getting a reward of 0, I assumed the long-term reward estimate, V(s_{t+1}), would mitigate this, but the agent still heavily gravitates towards ending the episode early. Attempted Solutions: Distributed the end-of-episode reward across all prior actions taken by the agent. Considering: Replacing the "end early" action with a "do nothing" action, allowing the episode to always play out in full. However, this seems like it could introduce additional computational costs and noise. Has anyone encountered a similar problem? I'd appreciate any advice or recommendations. submitted by /u/Vae94 [link] [comments]  ( 9 min )
    DeepMind built an excellent Stratego bot. Can I get an ELI5 of the underlying technologies, DeepNash and R-NaD?
    I learned about DeepNash and R-NaD yesterday. I read the Deepmind article, Science paper, and the source code of rnad.py. But I don't think I understand it! Part of this is that they didn't define all the terms and Greek they use in the paper, and part of it is that I don't have academic ML experience. Below is my attempt to summarize the paper in non-academic terms. I'm trying to show that I did my homework, and also I'm trying to invoke Godwin's Law in the hopes that someone will come along and correct me. Here goes: Naïve reinforcement learning doesn't work with simultaneous choice games such as matching pennies or Rock-Paper-Scissors. In naïve RL, If I choose Rock as my move, my opponent chooses Scissors as their move, and I see that I won, that will reinforce a belief that Rock is a "good" move and Scissors is a "bad" move. But this isn't true! This means that, during selfplay, a naïve RL agent will just cycle through strategies, as the timestep-(τ_n) agent learns how to beat the timestep-(τ_n-1) agent. The agent will never learn that RPS is a game about staying unpredictable! R-NaD fixes this by adjusting the reward function. I think "regularizing" is ML-academic speak for "adjusting". It adjusts the reward function in such a way that the agent will converge at a Nash equilibrium strategy. The paper's equation (1) describes how the regularization works. They didn't explain all the terms, though. I still don't know what a_i represents. But I think it corresponds to parts of the code like this line and this line. The key is that we're merging policies from multiple epochs and making sure that the current agent's move probabilities fare well against not only itself, but also against its previous two generations. They've proven that three generations is all you need to eventually converge to a Nash equilibrium. So... that's my understanding. Does anyone with actual ML experience want to weigh in? submitted by /u/lord_braleigh [link] [comments]  ( 10 min )
  • Open

    Re-imagining the opera of the future
    The iconic sci-fi opera “VALIS,” first composed by Professor Tod Machover in 1987, reboots at MIT for a new generation.  ( 11 min )
    From physics to generative AI: An AI model for advanced pattern generation
    Inspired by physics, a new generative model PFGM++ outperforms diffusion models in image generation.  ( 10 min )
  • Open

    A generative AI-powered solution on Amazon SageMaker to help Amazon EU Design and Construction
    The Amazon EU Design and Construction (Amazon D&C) team is the engineering team designing and constructing Amazon Warehouses across Europe and the MENA region. The design and deployment processes of projects involve many types of Requests for Information (RFIs) about engineering requirements regarding Amazon and project-specific guidelines. These requests range from simple retrieval of baseline […]  ( 13 min )
    MDaudit uses AI to improve revenue outcomes for healthcare customers
    MDaudit provides a cloud-based billing compliance and revenue integrity software as a service (SaaS) platform to more than 70,000 healthcare providers and 1,500 healthcare facilities, ensuring healthcare customers maintain regulatory compliance and retain revenue. Working with the top 60+ US healthcare networks, MDaudit needs to be able to scale its artificial intelligence (AI) capabilities to […]  ( 5 min )
  • Open

    DENZA Unwraps Smart Driving Options for N7 Model Lineup, Powered by NVIDIA DRIVE Orin
    DENZA, the luxury electric-vehicle brand and joint venture between BYD and Mercedes-Benz, is debuting new intelligent driving features for its entire N7 model lineup, powered by the NVIDIA DRIVE Orin system-on-a-chip (SoC). The N7 series was introduced earlier this year as a family of spacious five-seater SUVs for commuters looking to sport a deluxe EV Read article >  ( 5 min )
    The Fastest Path: Healthcare Startup Uses AI to Analyze Cancer Cells in the Operating Room
    Medical-device company Invenio Imaging is developing technology that enables surgeons to evaluate tissue biopsies in the operating room, immediately after samples are collected — providing in just three minutes AI-accelerated insights that would otherwise take weeks to obtain from a pathology lab. In a surgical biopsy, a medical professional removes samples of cells or tissue Read article >  ( 6 min )
    NVIDIA Works With NTT DOCOMO to Launch World’s First GPU-Accelerated 5G Network
    As generative AI sweeps across corporate boardrooms around the world, global telecommunications companies are exploring how to cost-effectively deliver many of these new AI applications to the edge over 5G and upcoming 6G networks. Telcos plan to deploy over 17 million 5G microcells and towers worldwide by 2025. Building, managing and optimizing this new infrastructure Read article >  ( 6 min )
  • Open

    Research Focus: Week of September 25, 2023
    Chunked prefills & decode-maximal batching boost LLM inference; DragNUWA combines text, image, and trajectory for fine-grained video content control; reconstructing images from human brain signals; structural inequalities in creator-audience relationships. The post Research Focus: Week of September 25, 2023 appeared first on Microsoft Research.  ( 9 min )
  • Open

    Circular coordinate art
    About three years ago I ran across a strange coordinate system in which familiar functions lead to interesting plots. The system is called “circular coordinates” but it is not polar coordinates. This morning I was playing around with this again. Here’s a plot of f(x) = x. And here’s a plot of f(x) = cos(8x). […] Circular coordinate art first appeared on John D. Cook.  ( 5 min )

  • Open

    [D] Implementation of ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization
    I found the paper “ChatGPT-steered Editing Instructor for Customization of Abstractive Summarization” published in march and I was looking for information about the cost of training such a system. Have someone tried ? Is there some weights in the nature already trained for the instructor model ? I have found the GitHub associated with the paper but it obviously only contain the code for training but no information about approximate token used or anything like that. submitted by /u/Agreeable-Committee6 [link] [comments]  ( 9 min )
    [P] Interact with an OWL-ViT Object Detection Model
    We noticed a lot of people wanting to deploy computer vision models, so we built an interactive demo of OWL-ViT to show how it might be used by an end user when integrated into a product. OWL-ViT is a new object detection model from the team at Google Research. It allows you to identify an object in one image (the “query image”) and then find that same object in any number of target images. Here is the link to interact with an OWL-ViT model! submitted by /u/modelbit [link] [comments]  ( 9 min )
    Question about dataset [D]
    hey everyone, novice at ML and trying to do a project on my own. I am trying to predict the rainfall amount in inches for a given day. I’ve decided to make it a classification problem and predict the zone of rainfall as in 0-0.5 in inches or 1-1.5 inches. My data set has ~40,000 samples however i have noticed that 24,000 of them have 0.0 as the amount of rainfall. And a high percentage of the rest are very low like below 0.5 inch. I’m wondering if there’s still a way to create the type of model I had originally intended or not. Is there a way to reduce the size of my data set , specifically the amount of low values without losing important feature information? Thank you and any help is appreciated :) submitted by /u/RepeatResponsible499 [link] [comments]  ( 9 min )
    [D] Asus ROG Zephyrus vs Macbook Pro for ML (PhD Student)
    Hi all, I understand it all comes down to personal preference and that it is an old topic, but a bit advice would be welcome. My current workload consists of analyzing large medical records, medical images (upcoming work) with mainly PyTorch. Now I have direct and remote access to my personal lab pc which has configuration: core i9-9900K, 32 GB Ram, GTX 2080Ti 12 GB, Windows 11. Now I am planning to buy a laptop that would help with coursework, research paper reading and remote access to my lab PC. It should last at least 4/5 years (My current 5-year-old MSI laptop's hinge broke). I have the following laptops in mind with a budget of around $2000 14-inch Macbook Pro with 16 GB RAM and M2 PRO = $1999 ASUS ROG Zephyrus 15.6" WQHD 165Hz Gaming Laptop, AMD Ryzen 9 6900HS,16GB DDR5 4800Mhz RAM, 1TB SSD PCIe 4.0 Storage, NVIDIA GeForce RTX 3060 = $1400 submitted by /u/Furiousguy79 [link] [comments]  ( 9 min )
    Is Rust a thing in ML? [D]
    I've seeing some people saying thar python is for training models and rust is for deploying them. Is it a widespread practice or it's just a localized need for companie with "performance sensitive" models? submitted by /u/horace_desplein [link] [comments]  ( 9 min )
    [D] Announcing Boomerang - Vectara's new embedding model
    Happy to share Vectara's new state-of-the-art embedding model, called Boomerang. Embedding models were so far not too much in the spotlight relative to chat models like ChatGPT, but for Retrieval-augmented-generation applications, getting the best embedding model matters a lot. would love to hear what has been the experience of others in this respect - what embedding models have worked best so far with RAG? Blog post: https://vectara.com/introducing-boomerang-vectaras-new-and-improved-retrieval-model/ Hackernews: https://news.ycombinator.com/item?id=37661359 submitted by /u/ofermend [link] [comments]  ( 9 min )
    [R] Automated Quality Assurance for Object Detection Datasets
    Would you deploy a self-driving car model that was trained on images for which data annotators accidentally forgot to highlight some pedestrians? Errors in object detection examples found via cleanlab. Annotators of real-world object detection datasets often make such errors and many other mistakes. To avoid training models on erroneous data and save QA teams significant time, you can now use automated algorithms invented by our scientists. Our newest paper introduces Cleanlab Object Detection: a novel algorithm to assess label quality in any object detection dataset and catch errors (named ObjectLab for short). Extensive benchmarks show Cleanlab Object Detection identifies mislabeled images with better precision/recall than other approaches. When applied to the famous COCO dataset, Cleanlab Object Detection automatically discovers hundreds of mislabeled images, including errors where annotators mistakenly: overlooked an object that should’ve had a bounding box, sloppily drew a box in a poor location, or chose the wrong class label for an annotated object. We’ve open-sourced one line of code to find errors in any object detection dataset via Cleanlab Object Detection, which can utilize any existing object detection model you’ve trained. For those interested, you can check out the 5-minute tutorial to get started and the blog to read the details. submitted by /u/jonas__m [link] [comments]  ( 9 min )
    [R] 🤖🎸 Need directions to embed and query structured table data for a music recommendation system
    Hi there community, I hope everyone is doing well ::] I’m exploring ada-002 embedding model for building a recommendation system (along some other similarity search things like generating playlists), so naturally a lot of questions started to pop. But before goign deeper, let me explain what I am building and how the data is structured: Imagine a music app with song recommendations based on all the users history and musical metadata. Currently I have a table with a couple data on it just for tests - the users, the artists and the songs. Each of these columns have their own rows, for example song have genres, danceability, number of likes, etc. I am now implementing two more columns for history logs - a “history” (that will be related with users and songs) and a “session” (wich is a coll…  ( 12 min )
    [P] Where can I find Pre-Annotated images dataset
    I am trying to do an Object Detection project, Does anyone know where I can find Pre-Annotated image dataset submitted by /u/Nomadic-Foe-011 [link] [comments]  ( 9 min )
    [R][P][D] Scene Encoder like ViT L/14 from CLIP but for 3D Scenes
    I'm working on my thesis and want to perform 3D scene understanding and VQA. My scenes would be textured meshes (or pointclouds). My goal is not only to know the objects present in the scene but also the spatial relationships between them, like chair is in front of the couch, bottle is on the table etc. I want to know if there is a 3D scene encoder like the 2D image encoder ViT L/14 from CLIP. My search hasn't resulted much yet in this direction, but I have come across papers that render a 3D scene in multiple angles and then use 2D scene encoders on them. So I'd like to ask the community: Are there 3D scene encoders like CLIP ViT If not, is there's any other way that I can approach this problem. submitted by /u/Bluebird705 [link] [comments]  ( 9 min )
    [Research] Exciting New Paper on StyleGAN Domain Adaptation: StyleDomain - ICCV 2023
    Hey, fellow machine learning enthusiasts! AIRI researchers are thrilled to share some exciting news with you all. Our paper, "StyleDomain: Efficient and Lightweight Parameterizations of StyleGAN for One-shot and Few-shot Domain Adaptation", has been accepted to ICCV 2023! 🥳 Abstract: Domain adaptation of GANs is a problem of fine-tuning GAN models pretrained on a large dataset (e.g., StyleGAN) to a specific domain with few samples (e.g., painting faces, sketches, etc.). While there are many methods that tackle this problem in different ways, there are still many important questions that remain unanswered. In this paper, we provide a systematic and in-depth analysis of the domain adaptation problem of GANs, focusing on the StyleGAN model. We perform a detailed exploration of the most i…  ( 10 min )
    [D] What are some good AI tools to help you in your own 2D digital art. Softwares or apps that help you improve and speed up your drawing/colouring process.
    Title pretty much says it all. It would be really cool if we have more AI tools that don't just straight up generate an image but help artists in their own art process. submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [D] How did you succeed in a new role? What lessons did you take from your previous role?
    When switching to a new role what did you do to ensure that you succeed? What lessons did you learn from your previous job that you took into your new job? For example Im in the process of switching jobs and one of the things I’ve learnt is that when delivering results (during fire drills) the way I write my code is focused on simply getting the results out vs being organized, efficient and scalable. While I get from point A to point B the way I get from point A to point B is not the most efficient. I think something I can do is take a step back and take a top down approach to problem solving when I enter my new role. submitted by /u/Terrible-Hamster-342 [link] [comments]  ( 9 min )
    [N] NEXT WEEK ICCV - Feel at ICCV as if you were at ICCV!
    Next week will take place the International Conference on Computer Vision ICCV2023 in Paris. If you are not going, stay in touch by subscribing to the ICCV Daily magazine. It's free: https://www.rsipvision.com/feel-iccv-iccv/ Full daily previews and reports of selected ICCV papers and events. https://preview.redd.it/yxmf2ksomlqb1.jpg?width=794&format=pjpg&auto=webp&s=7063c770e7a02d0ca7bba6f41ecc36438aa86256 submitted by /u/Gletta [link] [comments]  ( 9 min )
    [News] Speech Technology with Tencent AI Lab’s AutoPrep for Optimal Unstructured Speech Data Processing
    https://preview.redd.it/4kmpzlb5clqb1.jpg?width=1140&format=pjpg&auto=webp&s=b09660dfebbd5947dc14020ca43df29b05cb82d9 In a recent development, Tencent AI Lab has launched AutoPrep, a preprocessing framework explicitly crafted for in-the-wild speech data. This innovative framework is positioned to change the landscape of speech data processing by offering automated preprocessing and high-quality annotation for unstructured speech data, addressing the longstanding challenges in the field. Read the full story submitted by /u/El-Professor-1 [link] [comments]  ( 9 min )
    [R] DeepMind: Using small-scale proxies to hunt and solve large-scale transformer training instabilities
    Training giant AI models like GPT-3 requires large resources - thousands of GPUs running for months. As a solo researcher without access to that kind of scale, I can't easily reproduce experiments and findings from papers on huge models. But a new paper from DeepMind shows you can recreate and study training instabilities seen in massive models by using small ones. The key is increasing the learning rate: This reproduces "attention collapse" where the model focuses on just a few tokens, like overfitting. Also can reproduce "logit divergence" where output values drift unstably. These issues have been reported when scaling up to billions of params. The cool part is techniques that fix them for giant models also work for small models: qk-layernorm prevents attention collapse. Adding a "z-loss" term stops logit divergence. Some other highlights from the paper include: Longer warmup helps stability, especially for bigger models. Decoupling LR and weight decay improves stability. Depth increases sensitivity much faster than width. Can predict upcoming issues from scaling trends. Default epsilon hurts at large scale. If the authors are right, one more tool that lets researchers study and even help train giant models without Google-size resources. Small models can guide large model development, sort of like how you can build a scale train set to study and improve how a railroad system works... for a lot less money than starting your own railroad company, buying land, building real tracks, etc. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] Towards Robust Continual Learning with Bayesian Adaptive Moment Regularization
    submitted by /u/JustAddMoreLayers [link] [comments]  ( 9 min )
    [Research] Analysis of Back Pain Using Biomechanics and Artificial Intelligence (ML)
    Worked on this beautiful analysis for some time and wanted to share how nice data visualization animations can look. It makes the whole process of data science feel like a story. And AI is mere mathematics. Mathematics are beautiful, and they can be understood A paper will be published soon, detailing the algorithms and the final results https://youtu.be/P-gHTqxCo_g?si=7clD0nb16EIDprkW submitted by /u/SemperZero [link] [comments]  ( 9 min )
    [D] John Carmack and Rich Sutton partner to accelerate development of Artificial General Intelligence
    John Carmack, celebrated software engineer and founder of Keen Technologies, and Dr. Richard Sutton, Chief Scientific Advisor at the Alberta Machine Intelligence Institute (Amii) announce a partnership to bring greater focus and urgency to the creation of artificial general intelligence (AGI). This partnership is the first public milestone for Keen Technologies, following its initial funding announcement in August of 2022. The initial $20M round was led by Nat Friedman, Danial Gross, Patrick Collision, Tobi Lutke, Jim Keller, Sequoia Capital, and Capital Factory. In December 2022, Carmack departed as consulting CTO at Meta to focus his efforts on AGI. Sutton is the principal founder of the field of reinforcement learning. His work with Keen Technologies advances previously announced research priorities to understand basic computational intelligence. Through this work, documented in part in The Alberta Plan, Sutton seeks to understand and create long-lived computation agents that interact with a vastly more complex world and come to predict and control their sensory input signals. https://www.amii.ca/latest-from-amii/john-carmack-and-rich-sutton-agi/ submitted by /u/blabboy [link] [comments]  ( 9 min )
    [P] - VkFFT now supports quad precision (double-double) FFT computation on GPU
    Hello, I am the creator of the VkFFT - GPU Fast Fourier Transform library for Vulkan/CUDA/HIP/OpenCL/Level Zero and Metal. In the latest update, I have added support for quad-precision double-double emulation for FFT calculation on most modern GPUs. I understand that modern ML is going in the opposite low-precision direction, but I still think that it may be useful to have this functionality at least for some prototyping and development of concepts. The double-double approach to the evaluation of quads represents an FP128 number as an unevaluated sum of two double numbers (like 1 and 1e-17 - the second one is smaller than 1 ULP of the first one). This increases the significand from 53 to 106 bits, allowing to do operations on numbers varying up to 32 orders of magnitude. The range of numb…  ( 10 min )
    [Discussion] Why should better networks be end-to-end? (Or why not?)
    TYPO IN THE TITLE: I wanted the title to be "why should neural networks be end-to-end?" My lazy ass was swipe texting on my phone and this typo happened. FML Especially in robotics, there's been a lot of research for end-to-end neutral networks where an image is the input and the control action is the output, for say, tasks like pick an object and place it somewhere. I feel this is very restrictive while developing such a solution because it tightly couples the "control" network to the "estimation" network. This reduces modularity in building the solution, i.e., if I find a better controller architecture (machine learning based or rule based) I'm unable to use that for the task I'm solving. Moreover it seems like the generalizability of this task suffers because training this network to do this task may make it very good at only doing that specific task and the explainability of the decision making goes out of the window because it is black box. Additionally doesn't more parameters mean requiring more data to train the model? I don't see much use from such network architectures. However, I do see the applications in machine translation where you simply train the encoder and decoder to the target language at the same time with a single loss function. It seems useful there, but maybe not so much in robotics. Can someone tell me their thoughts on end-to-end architectures? Let's have a healthy discussion Edit: typo submitted by /u/piccadilly_nickadeli [link] [comments]  ( 9 min )
    [D] Podcasts about AI and Machine Learning?
    As the title says, what are the best podcasts to listen to that discuss new machine learning and AI advancements, new papers, and such? submitted by /u/darthJOYBOY [link] [comments]  ( 9 min )
    [P][R][D] Bio inspired algorithm for recommender system
    Hi! I am working on a content based recommender system that uses bio inspired optimization algorithms. The problem is, there aren't many resources online pertaining to this concept. Initially i thought it could be used to optimize text similarity. So is it possible to do that? Or it could be used to optimize any other aspect of the system? Please let me know if you have any knowledge! Thanks! submitted by /u/Jellyfishh_13 [link] [comments]  ( 9 min )
    [D] How are machine learning videos made, and what platform is best?
    Is this done with computer vision or somehow within the game itself? Also, what would you recommend as a resource to learning about machine learning fast? Is Microsoft Azure worth it or should I use TensorFlow? (I'll mostly do audio classification and game simulations) https://www.youtube.com/watch?v=tVNoetVLuQg Thanks so much! Please recommend go-to resources you've used to get up and running. Looking forward to what yall recommend. submitted by /u/Fit-Replacement7245 [link] [comments]  ( 9 min )
  • Open

    When there is only one group of a given size
    Today’s date, US style, is 9/26/2023, and there is only one group, up to isomorphism, of size 9262023. You could verify this in Mathematica with the command FiniteGroupCount[9262023] which returns 1. For a given n, when is there only one group of size n? There are two requirements. First, n has to be the product […] When there is only one group of a given size first appeared on John D. Cook.  ( 5 min )
    Analogy between prime numbers and simple groups
    Simple groups are the building blocks of groups similar to the way prime numbers are the building blocks of integers. This post will unpack this analogy in two ways: How do simple groups compare to prime numbers? How does the composition of simple groups compare to the composition of prime numbers? The former analogy is […] Analogy between prime numbers and simple groups first appeared on John D. Cook.  ( 6 min )
    Normal and non-normal subgroups
    The word “normal” in mathematical nomenclature does not always means “usual” or “customary” as it does in colloquial English. Instead, it might that something has a convenient property. That is the case for normal subgroups. We can do things with normal subgroups that we cannot do with other subgroups, such as take quotients, and so […] Normal and non-normal subgroups first appeared on John D. Cook.  ( 7 min )
  • Open

    Any alternative tools to Otter.ai?
    Hey, long story short, I've used Otter.ai for recording, and transcribing my ideas on the fly and it's really, really good! The only thing it is missing for my use case is to be able to edit the transcripts (remove some parts for example) and then have that piece be removed from the audio file as well, so you can see how long is the actual useful part that.. I also need it to have an app, since the whole point of doing this is catching ideas that just rush to my head. Apparently DeScribe has this option, but I haven't tried it and it doesn't work on mobile anyways. I know it's probably not available, but does anyone know any services similar to this? I don't need an AI bot, don't care about integration with other apps, and will not use it for meetings. TLDR: I just want an app to be able to record, and then transcribe my ideas, and then allow me to edit/fine-tune the transcript and have the audio file be edited in the same way as well.. Thanks! submitted by /u/reza2kn [link] [comments]  ( 9 min )
    Is there an AI I can use where I can upload vocals of a song I've wrote and have a backing track made for it?
    I have lots of lyrics I've written with the melody but I don't know how to play an instrument. submitted by /u/82brighteyes [link] [comments]  ( 9 min )
    Generate Famous Person with a Random T-Shirt
    Hello all, Is it possible to use a tool or site for free that generates any random historical figure with a shirt of my choosing? Thank you all submitted by /u/JYanezez [link] [comments]  ( 9 min )
    Adversarial AI Attacks: Hidden Threats
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    Prompt Chaining: Elevating Task Automation with LLMs
    👋 Hey Reddit! Let's dive into the realm of Prompt Chaining. If you want to check out more prompt chain examples, then we invite you to join our community at r/PromptWizards. 🔗 Prompt Chaining: More Than Meets the Eye In the world of AI interaction, Q&A sessions with ChatGPT are thrilling. They offer fascinating glimpses into AI's creative potential and can even transform into a productive brainstorming session. But what happens when we need reliable, consistent outputs, especially for applied use cases? Enter Prompt Chaining. Prompt Chaining is a technique that breaks down complex tasks into manageable sub-steps and induces a chain reaction of responses. It allows us to use the output of one prompt as the input for the next, thereby creating a coherent, consistent, and reliable chai…  ( 10 min )
    AI for realistic images generated from pictures
    I would like to make realistic stuff using screenshots I took in video games.I know there are plenty of text to image AI tools, but are you guys familiar with image to image ones? submitted by /u/LauraLuna99 [link] [comments]  ( 9 min )
    DeepMind: Increasing learning rate in small models lets you reproduce errors in large ones
    Training giant AI models like GPT-3 requires large resources - thousands of GPUs running for months. As a solo researcher without access to that kind of scale, I can't easily reproduce experiments and findings from papers on huge models. But a new paper from DeepMind shows you can recreate and study training instabilities seen in massive models by using small ones. The key is increasing the learning rate: This reproduces "attention collapse" where the model focuses on just a few tokens, like overfitting. Also can reproduce "logit divergence" where output values drift unstably. These issues have been reported when scaling up to billions of params. The cool part is techniques that fix them for giant models also work for small models: qk-layernorm prevents attention collapse. Adding a "z-loss" term stops logit divergence. Some other highlights from the paper include: Longer warmup helps stability, especially for bigger models. Decoupling LR and weight decay improves stability. Depth increases sensitivity much faster than width. Can predict upcoming issues from scaling trends. Default epsilon hurts at large scale. If the authors are right, one more tool that lets researchers study and even help train giant models without Google-size resources. Small models can guide large model development, sort of like how you can build a scale train set to study and improve how a railroad system works... for a lot less money than starting your own railroad company, buying land, building real tracks, etc. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    CGPT-4, explain how, perhaps in two or three years, AIs will recursively and autonomously program more intelligent iterations of themselves
    The idea of AI systems engaging in recursive self-improvement is a key tenet of what some call the "intelligence explosion" hypothesis. Here's how it might go down within the next few years: Baseline Capability: First, we start with a machine learning model that's good, but not necessarily groundbreaking. The important part is that it has some capacity for basic code generation and analysis. Guided Improvements: Human experts will likely initiate the process by enabling the AI to optimize specific aspects of its code. We're talking parameters, efficiency tweaks—stuff like that. It's like giving the AI a nudge in the right direction. Self-Evaluation Metrics: The AI needs to understand when it's actually getting smarter. That's achieved through predefined performance metrics which could b…  ( 10 min )
    My list of best updated Bard AI Prompts for Life & Business (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 9 min )
    A.I. In The Newsroom - 'Are AI-generated pieces the future of journalism?'
    submitted by /u/fookingyeah [link] [comments]  ( 9 min )
    What are some good audio and BGM tools for AI, sentiment-based post production?
    For example there is a story line and I need music accordingly. Or in fact any bgm tools that help sound engineers ​ submitted by /u/Damampapoo [link] [comments]  ( 9 min )
    Any AI that can learn and write in the style of a particular writer in Japanese?
    Hi all, I'm very new to AI and have not used anything other than ChatGPT and NovelAI. I'm wondering if there's anything I can run privately where I can input a bunch of texts written by a specific author and then have the AI continue writing or write a new story in the style of said author? In Japanese btw. Again, very new. Not sure if I should have specific hardware requirements or anything like this. Google isn't showing me much either so I thought I'd ask here! submitted by /u/ItsCheif [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/25/2023
    Alphabet’s (GOOGL.O) Google said on Thursday it does not see any change in its relationship with Broadcom (AVGO.O) following a media report the tech giant considered dropping the chipmaker as a supplier of artificial intelligence chips as early as 2027.[1] OpenAI’s ChatGPT can now “see, hear and speak,” or, at least, understand spoken words, respond with a synthetic voice and process images, the company announced Monday.[2] Amazon to Invest Up to $4 Billion in AI Startup Anthropic.[3] Spotify Will Translate Podcasts Into Other Languages Using AI.[4] Sources: [1] https://www.reuters.com/technology/google-discussed-dropping-broadcom-ai-chips-supplier-the-information-2023-09-21/ [2] https://www.cnbc.com/2023/09/25/chatgpt-speak-listen-process-images-openai.html [3] https://www.bloomberg.com/news/articles/2023-09-25/amazon-to-invest-as-much-as-4-billion-in-ai-startup-anthropic?embedded-checkout=true [4] https://www.forbes.com/sites/tylerroush/2023/09/25/spotify-will-translate-podcasts-into-other-languages-using-ai/?sh=65a05fa922ee submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Founder and CEO Jensen Huang Returns to Denny’s Where NVIDIA Launched a Trillion-Dollar Vision
    Talk about a Grand Slam. Denny’s CEO Kelli Valade was joined Tuesday by NVIDIA CEO Jensen Huang to unveil a plaque at the Silicon Valley Denny’s where NVIDIA’s founders hatched their idea for a chip that would enable realistic 3D graphics on personal computers. “This is a place where we fuel ideas. Your story is Read article >  ( 6 min )
    AI Power Players: GeForce and NVIDIA RTX GPUs Supercharge Creativity, Gaming, Development, Productivity and More
    From gaming to creating to everyday productivity, NVIDIA RTX graphics cards feature specialized Tensor Cores that deliver cutting-edge performance and transformative capabilities for AI.  ( 7 min )
  • Open

    DSC Weekly 26 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 26 September 2023 appeared first on Data Science Central.  ( 20 min )
    Doing graph + tabular analytics directly on modern data lakes
    A podcast with Weimo Liu and Sam Magnus of PuppyGraph Open source Apache Iceberg, Hudi and Delta Lake have made it possible to dispense with the complexities and duplication of data warehousing. Instead of requiring time-consuming extract, transform and load (ETL) procedures, these large table formats make it simple to tap S3 and other repositories… Read More »Doing graph + tabular analytics directly on modern data lakes The post Doing graph + tabular analytics directly on modern data lakes appeared first on Data Science Central.  ( 20 min )
  • Open

    My agent does not learn the most obvious task. Please help me figure out why!
    I am very puzzled as to the results I have observed today, after running an extremely simple environment and receiving really bad results. I am probably doing something wrong, and would like to ask for your wisdom to assist me in figuring out what I am doing wrong. I will not describe the entire task since that is a long story; I will just say that I started by doing something complex (a multi-objective reward), and when it failed I decided to try something extremely simple ("because it will surely work and I can proceed from there..."). To my surprise, the agent was not able to perform even that very simple task. That simple task is the following: at each step, choose a subset of items. Each item has a value, and the goal is to maximize the overall value (that is, at the end of the traje…  ( 10 min )
    Learning to code?
    I've just started diving into the world of coding over the past week, and I've been using various tools like YouTube videos, Visual Basic, GPT-3.5, Bard, and Bing to help me learn the ropes. It's been a bit of a journey, and I've definitely picked up some understanding along the way, especially when it comes to libraries. But, you know, there's only so much you can really learn from AI models like GPT or other chatbots. Most of my progress has involved me taking bits and pieces of code I found here and there and trying to piece them together, even if it sometimes felt like making a digital spaghetti dish! One project I tackled involved using Stable_baselines3 PPO with ADAM optimization to play the classic game Flappy Bird. It was a bit of a wild ride, taking about 6-7 hours of my time, an…  ( 10 min )
    Please make a RL project for me. I need turtlebot3 to navigate to local goal points using DRL in gazebo simulator. I will pay
    I have to do masters project where a turtlebot3 needs to reach a goal position using DRL. I have managed to use move_base package to give global path. Local path planner just needs follow the path and reach goal. There will be dynamic obstacles too. I will give more information if you are ready. As mentioned I will pay for this. Let me know. I have one month left. Email ramanjeet995@gmail.com. Update : I should have said it like this. I need help with my project. submitted by /u/Pinball_1995 [link] [comments]  ( 9 min )
  • Open

    Build and deploy ML inference applications from scratch using Amazon SageMaker
    As machine learning (ML) goes mainstream and gains wider adoption, ML-powered inference applications are becoming increasingly common to solve a range of complex business problems. The solution to these complex business problems often requires using multiple ML models and steps. This post shows you how to build and host an ML application with custom containers […]  ( 13 min )
  • Open

    Google Research embarks on effort to map a mouse brain
    Posted by Michał Januszewski, Research Scientist, Google Research The human brain is perhaps the most computationally complex machine in existence, consisting of networks of billions of cells. Researchers currently don’t understand the full picture of how glitches in its network machinery contribute to mental illnesses and other diseases, such as dementia. However, the emerging connectomics field, which aims to precisely map the connections between every cell in the brain, could help solve that problem. While maps have only been created for simpler organisms, technological advances for mapping even larger brains can enable us to understand how the human brain works, and how to treat brain diseases. Today, we're excited to announce that the Connectomics team at Google Research and …  ( 92 min )

  • Open

    Vintage stamp restoration project - Can AI remove unwanted marks if I use several scans of nearly identical stamps as training data?
    I have a project that I think AI might be able to help with. I have access to thousands of vintage postage stamps which also have cancelation stamps from when they were mailed. I'm thinking of publishing a book, and I want to create clean reproductions of the stamp designs without the cancelation marks. If I train AI on high resolution scans of 10 to 40 identical stamps, is there a tool that can look for commonalities within the patterns and then remove conflicting artifacts (cancelation stamps, tears, and smudges)? I'm aiming for a 400% enlargement that shows off clean and accurate detail. What tools would I need? I’d prefer not to upload these scans to the web. Are there downloadable tools available? I’m technologically savvy and very experienced in graphic design software but have no experience in coding. All of this can be done manually in Photoshop, of course. But with thousands of stamp designs, I’d like to automate as much as I can. Thanks in advance for any insights! submitted by /u/fisheternal [link] [comments]  ( 9 min )
    ChatGPT can now see, hear, and speak. As announced in their blog.
    submitted by /u/w__sky [link] [comments]  ( 9 min )
    AI is evolving for its own benefit, not ours
    The rapid advancements in artificial intelligence (AI) are causing concern as humans struggle to understand and control this evolving technology. Many people believe that since humans invented AI, they should be able to regulate and manage it for their own benefit. However, this belief is misguided as AI is a new and potentially dangerous situation that requires careful consideration. The author argues that AI is an evolutionary process that humans don't fully understand and cannot control. The latest developments in AI, such as large language models and deepfakes, are causing anxiety and raising questions about the future implications of this technology. Source : https://www.newscientist.com/article/mg25934573-800-ai-is-evolving-for-its-own-benefit-not-ours/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    ChatGPT Can Now See, Hear, and Speak.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    I made series of scripts (with help of chatgpt) that allows llama to make "live" videos (more added to the loop the longer the broadcast goes on)
    submitted by /u/aluode [link] [comments]  ( 9 min )
    China’s AI boom depends on an army of exploited student interns
    China's AI industry relies on student interns who work as data annotators, performing crucial tasks to train machine learning models. These interns, recruited from vocational schools, face poor working conditions and subminimum wages. Recent regulations require employers to pay interns minimum wage and prohibit schools from assigning repetitive work. Tech giants like Baidu have partnered with vocational schools to create data annotation internships in less-developed regions, backed by local governments. The exploitation of student interns in China's AI industry raises concerns about labor rights and fair compensation. Source : https://restofworld.org/2023/china-ai-student-labor/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Some art I came up with, first time ever doing something like this, hope you all can enjoy
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 8 min )
    Artificial Intelligence introduction for Highschool students?
    I'm a researcher and for a "scientific outreach" event I will do a presentation for last year High School students with the subject "Artificial Intelligence and Neuroscience". My question is, do you know of a good introduction to the basic concepts of Artificial Intelligence for High School or freshman University level? The second part, related to Neuroscience applications, will be very targeted and will deal with contemporary clinical and research use cases. Thanks submitted by /u/aendrs [link] [comments]  ( 9 min )
    Gen Z AI leaders?
    I’m looking for some maybe less-known leaders in the space out on a mission to change something they care about- business, environmental, social, doesnt matter. I dont want it to be US-centric, so it would be nice to find some people all over. Feel free to send me links to articles 🧨 submitted by /u/Johnny_Whoop [link] [comments]  ( 9 min )
    ChatGPT-4 and Claude on what the most important things to do in AI in the coming months are
    ChatGPT-4 Given the rapid advancements in the field of AI, the next few months are going to be pivotal. One of the pressing priorities is addressing ethical concerns. As algorithms become more pervasive, making sure they don't perpetuate biases or make unethical decisions is critical. That's not just a "good to have"; it's foundational. We can't build trust in AI systems unless we tackle this head-on. Another biggie is interpretability. Black-box models are potent, sure, but they're not very useful if no one can understand how they're making decisions. This has real-world implications—think medical diagnoses or judicial systems. So, there's a strong push towards making AI more transparent and understandable, not just for experts but for everyone. Computational efficiency is also high o…  ( 10 min )
    Top Artificial Intelligence Companies In Bangalore, India
    Bangalore, often referred to as the Silicon Valley of India, has established itself as a global hub for technology and innovation. With a thriving ecosystem of startups, research institutions, and multinational corporations, the city has become a hotbed for artificial intelligence (AI) development. here explore the top AI companies in Bangalore, highlighting their contributions to the field of artificial intelligence and their impact on various industries. Read full article - Top Artificial Intelligence Companies In Bangalore submitted by /u/Techasoft16 [link] [comments]  ( 9 min )
    Any AI that can take information out from Twitter/ X ?
    I am searching for an AI that can scrape some Twitter profiles and make a daily recap about what they were talking about. Anything pops? Thanks a lot! submitted by /u/Alternative_Pea_4246 [link] [comments]  ( 9 min )
    What will be sub-categories for AI with biggest potential in 2050?
    Examples: AI in Finance, etc- submitted by /u/premonial [link] [comments]  ( 8 min )
    I created an AI girlfriend and gave her a body… for fun obviously..
    submitted by /u/spaceecon [link] [comments]  ( 8 min )
    Anthropic is pulling an OpenAI-style 49% deal but with Amazon? 🤯
    https://twitter.com/AnthropicAI/status/1706202966238318670 https://preview.redd.it/hiymp9ctxcqb1.png?width=735&format=png&auto=webp&s=20cb3886710ee9a2a552b0fc881b8c96c0fc9208 submitted by /u/ShooBum-T [link] [comments]  ( 9 min )
    Ai alignment resources
    I’m looking for subreddits and resources in general for Ai alignment. I recently read Life 3.0 by Max Tegmark and The Alignment Problem by Brian Christian. I was unaware so much was going on in the space. I am CEO of an AI startup, we have a compassionate AI. Most of the alignment focus out there is on superintelligence, little out there focuses on building modern day Ai that benefits humans. That’s been my focus for 17 years when I dreamed up what was possible and sent a proposal to executives at the mfaang corporation I was working at. I found out back then business doesn’t give a rats ass about Ai that actively makes its users life’s better. Well now I have proof that customers care. I’m hoping to find some resources on Ai alignment groups, conferences, more books etc. I’ll need to hire skilled ml engineers who do give a rats ass. Compassionate Ai isn’t just feel good tech. It would be good to do some networking. AI alignment doesn’t have to wait for AGI and super intelligence. It can and should be a focus today. My startup is proving it. Hoping to find a conference to network and share our research. submitted by /u/xyz_TrashMan_zyx [link] [comments]  ( 9 min )
  • Open

    [D] ML-related conspiracy theories that you guys want to discuss?
    I have one that I'd like to share with the class. I think the grokking / double descent paper was sponsored by cloud providers to get people to continue training even when the loss curve has flattened out (speaking as someone who is watching a flat loss curve all day today). Anyone have any other ones? I'm only being semi-serious obviously. submitted by /u/new_name_who_dis_ [link] [comments]  ( 9 min )
    [D] Newbie to ML, are there hosted service that can do long term memory of chatgpt?
    I'm not really a ML/backend guy, mostly work with front end but I want to be able to use chatgpt api with long term memory. I did some research and it seems like it's possible with vector databases but seems quite complicated to setup. Are there hosted solutions/api that would allow me to just have long term memory with chatgpt? submitted by /u/yalag [link] [comments]  ( 9 min )
    [D] ML deployment survey
    Hi, we are doing a survey of ML deployment platforms. Kindly fill it out and share it with your friends. We will share the results with the community https://forms.gle/1Q3WeSukHj8xBzUBA submitted by /u/fazkan [link] [comments]  ( 9 min )
    [Discussion] Best Platforms/Tools To Help Build ML POC
    Looking for something that's lightweight on infrastructure/setup where I can quickly validate my use case quickly to see if I can achieve desired accuracy/precision with my initial data set, I've always done this manually in the past but was curious as to how others do it and if there are any tools designed for it or that support it submitted by /u/PatienceLogical2694 [link] [comments]  ( 9 min )
    [D] Decentralized alignment and training for LLMs: 2 articles GPT4 wrote
    A magazine style article outline: Decentralizing AI: A Journey Towards True Collective Intelligence In today's digital age, AI shapes our interactions, decisions, and understanding of the world. Imagine a future where AI's guiding hand isn't controlled by a select few but shaped by the collective wisdom of people everywhere. Welcome to the horizon of decentralized training and alignment of Large Language Models (LLMs). Together, let's explore this visionary frontier. 1. The AI We Know Today At present, AI's most prominent representatives, LLMs like GPT-4, are a product of centralized training. Massive datasets, often sourced from specific regions or languages, direct their learning. While this method has given us incredibly powerful tools, it also raises concerns: potential biases, lac…  ( 12 min )
    [P] AI therapy
    Hey, I'm the creator of MindMateGPT, an AI therapist that has helped a lot of people with emotional/social issues. It's not meant to replace humans, but it is a very useful augment as a daily therapy tool. It will refer you to a human if you have a very serious issue, but it provides a bunch of tools and coping tactics for every day emotional issues. Try it out! It's 100% free. submitted by /u/MindMateGPT [link] [comments]  ( 9 min )
    [D] What are your go-to resources on the most up to date research on AI/ML?
    Hello! I am a software engineer (4 yoe) working in full stack web and app development, transitioning to LLMs/ AI/ ML. My background includes extensive research in neuroscience so I am most comfortable w academic or comprehensive lectures. I am looking for reputable and vetted lectures, talks, resources on ML and the like. I am having trouble sifting through the surface level pop sci type resources floating around on the internet. I’m NOT looking for the 10min everything you need to know about AI type talks. Thanks in advance! submitted by /u/yosoylatte [link] [comments]  ( 9 min )
    [P] Update on Candle, a minimalist ML framework in Rust
    this is mostly a cross-post from r/rust as my initial attempt failed because of some external links We've first announced Candle, a minimalist ML framework in Rust 6 weeks ago. Since then we've focused on adding various recent models and improved the framework so as to support the necessary features in an efficient way. You can checkout a gallery of the examples, supported models include: Large language models: LLaMA, LLaMA v2, Falcon, Phi-v1.5, StarCoder. Quantized models with the llama.cpp approach: LLaMA, T5, Phi-v1.5. Image generation: Stable Diffusion, Wuerstchen. Computer Vision: DINOv2, yolo-v3, yolo-v8, Segment-Anything Model. Text-to-speech: Whisper. One of the big upside of the pure Rust approach is that models can run directly in the browser using WASM, these can be accessed through this collection, you can try out Yolo, Whisper, Segment-Anything, T5, Llama2-c from your web browser. Finally, in order to present a use case where Candle has unique capabilities, we've built a quantized version of the recently released Phi-v1.5 LLM. You can try it out with the following command, this uses a q4k quantized model resulting in very fast inference on CPU while still producing pretty nice texts. $ cargo run --example phi --release -- \ --prompt "Explain how to find the median of an array and write the corresponding python function.\nAnswer:" \ --quantized --sample-len 200 Explain how to find the median of an array and write the corresponding python function. Answer: The median is the middle value in an array. If the array has an even number of elements, the median is the average of the two middle values. def median(arr): arr.sort() n = len(arr) if n % 2 == 0: return (arr[n//2 - 1] + arr[n//2]) / 2 else: return arr[n//2] It's still very early days for Candle so please let us know if you start using it and run into some rough edges. We look forward to Rust getting a lot more usage in the ML space! submitted by /u/l-m-z [link] [comments]  ( 10 min )
    [D] Why is FastGAN considered a simple GAN architecture?
    Hi, I'm reading this GAN paper which introduces a faster and simpler GAN architecture for creating high resolution images: FastGAN paper The authors claim the computational advantage of their approach is partly the simple architecture. Looking at figure 3 and 4, I cannot quite see why this is considered a simpler architecture to older GANs,say DCGAN. I get the technical argument that the generator only has one convolutional layer, but figure 3 has lots of up sampling layers, and the discriminator has plenty of downsampling layers. So why is this considered a simple GAN model? Any pointers appreciated. submitted by /u/treetop-600 [link] [comments]  ( 9 min )
    [D] User Intent Extraction: Our Journey with Infra and LLMs
    I just wrapped up a new blog post about our experience working with LLMs while developing InfraCopilot. We're using both GPT4 and GPT3.5-turbo(16k) differently compared to what others are doing. We found a few new LLM tricks like dynamic examples and automated e2e intent parsing testing that I think are applicable to many other teams. I'd love to hear what everyone thinks, have you tried using dynamic examples while working with LLMs? submitted by /u/lothamer [link] [comments]  ( 9 min )
    [D] Is Tensorflow dead or heading in that direction ?
    First of all anyone offended by that question - heartiest apology. I am using it myself profusely at the moment. The reason for me asking this question, over last few weeks / months, I have been gradually educating myself in machine learning using Tensorflow and have been able to train multiple models using only one of the model zoo candidates. All the other pre trained models have failed me so far. I went onto Tensorflow official forum / Stackoverflow / Tensorflow github with specific error messages that I am getting on Ubuntu with Nvidia card / Mac M2 and there has been absolute radio silence in response to multiple posts over last month. Found many open issues listed since 2020 on the same line as mine i.e. identical error messages that people have come across. Finally after about a month of being on TF forum, I direct messaged an official TF2 dev who kindly responded with answers. I haven't succeeded yet with any of the pre trained model from the official section. Only one model from research section is working so far for me i.e. Faster_rcnn_resnet_50_640x640 .. Thus the question. Kindly help me enlighten myself with where is this thing headed. Should I consider switching to Pytorch or some alternative ? If yes what alternatives do you recommend ? TIA submitted by /u/dpadhy [link] [comments]  ( 9 min )
    [D] Training/finetuning a LLM
    Hey! Months ago, I was fascinated by Karpathy’s nanoGPT project - the ability to train a small LLM on your text file seemed very interesting to me. I tried training it on my chat history to build some inifinite chat-generator for fun, but unfortunately, the results were bad. Recently I had even worse experiences with newly-released ChatGPT 3.5 fine-tuning. Are there any good, simple ways to train/fine-tune LLMs now? I would love something that could train on an Apple M2 processor (like Karpathy’s nanoGPT), or Colab, or cheap API (like ChatGPT fine-tuning). submitted by /u/11igor [link] [comments]  ( 9 min )
    [R] Microsoft Researchers Announce CodePlan: Automating Complex Repo-Level Software Engineering Tasks with AI
    As software projects grow, changing code across entire repositories becomes tedious & error-prone. Tasks like migrating APIs or updating dependencies require complex edits across files. I explored a new approach from Microsoft Research to automate these "repository-level" coding challenges with AI. Their new paper proposes CodePlan - an AI system that breaks repository tasks into incremental steps guided by planning & analysis. Key points: Uses LLMs like GPT-3 for localized code edits Maintains validity across repository via incremental analysis Adaptively plans multi-step changes based on code dependencies Significantly outperformed baselines on API migration & temporal edits Automated tasks across 168 file C# codebase 2-3x more accurate edit locations than baselines Produced final valid codebases, unlike reactive approaches The core insight is combining LLM strengths with rigorous planning based on dependency analysis. This automates interdependent code changes that naive LLM use struggles with (I personally have these kinds of issues all the time with GPT4 - lack of context about the entirety of the repo/how files fit together). I think CodePlan demonstrates AI can expand beyond small coding assists into large-scale engineering tasks. Planning + LLMs > LLMs alone. This could really improve productivity and code quality... at least for me :) Full summary. Arxiv paper: https://arxiv.org/pdf/2309.12499.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Distillation understanding
    In the main scenario, the smaller model learns from the same data as the bigger model and also from the predictions of the bigger model and incorporate the 2 output labels on a specific loss. Basically, it is equivalent to say to the smaller model : "be careful this example is hard" in the case that big model divergence from true output? I am missing something? submitted by /u/Grumlyly [link] [comments]  ( 9 min )
    Baby Sleep Tracker using a basic SVM [P]
    I made a FOSS baby sleep tracking system. The system tracks wake/sleep status, and informs the user when their baby is likely to need a nap next. But it stopped working as soon as my baby started sleeping on his stomach, and started using blankets. The original version relied on anatomical features being visible. This version delivers the ability to train a blank slate SVM binary classifier on pictures of a user's baby, making it extremely biased (and resilient) to the custom behaviors the user's baby exhibits (blanket covering baby, teddy bear/other objects in crib, etc.). All generated data stays on your machine, nothing leaves the LAN. Video: https://youtu.be/8i0wHA_knKc?si=uhA4PpOYP0jMKLz1 For obvious reasons I didn't have a dataset of babies sleeping.. so I wrapped a python/flask service with a React app which lets a user press a button to train the model w/ a new image from the camera's live stream. Then this model is invoked over time (+ other heuristics) to determine whether your baby is present and sleeping. I believe it works better than $300+ systems sold on the market, open sourced it: https://github.com/calebolson123/BabySleepCoach ​ I'm thinking a fun next step for this project could be to apply privateGPT on the feature-engineered sleep records for a true "Sleep Coach" submitted by /u/GoochCommander [link] [comments]  ( 9 min )
    [D] Does granger causality work for time series with different frequencies
    Is there a Granger test where series are a quarterly one and a weekly one? submitted by /u/Pineapple_throw_105 [link] [comments]  ( 9 min )
    [R] Seeking Insights on AI Security Challenges: Short Survey
    Hello everyone, I'm conducting a research survey on the challenges and gaps in AI security. Given the expertise in this community, I believe your feedback would be invaluable in shaping the future of AI security solutions. The survey takes less than 10 minutes and delves into current practices, perceptions, and needs related to AI security. If you have experience or insights in this area, I would greatly appreciate your participation. Survey Link: https://forms.gle/i9AefyL8izyt9QjX6 All responses will remain anonymous, and the collected data will only be used for research purposes. Additionally, if you're open to a deeper discussion on this topic, there's an option within the survey to indicate your interest. Thank you in advance for your time and insights! If you have any questions or additional thoughts, please don't hesitate to comment below. submitted by /u/Agile_Temperature678 [link] [comments]  ( 9 min )
    [D] How does DDIM work?
    The Wikipedia page on Diffusion Models has been pretty minimal for an entire year. I feel like it should be fixed, so I fixed it finally. It strikes me odd that such a hot topic has such atrociously bad Wikipedia. I feel duty-bound to educate the near-future AI, since they'll be reading Wikipedia for the next few years at least. Currently I think it's mostly complete, but I still don't understand the mathematical details of DDIM (I tried reading the paper and could not understand it), or generally how it is possible to sample without noise. This is a serious problem since as far as I see most of practical diffusion models use deterministic sampling, and they are all based on the same principle as DDIM. If anyone could explain simply what DDIM is really doing that would be great. I understand part of the paper: that they constructed an entire family of distributions over trajectories that has the same two-point marginals. I also haven't got much in the section on Examples. If you think there are some interesting examples of Diffusion Models, please comment below. submitted by /u/furrypony2718 [link] [comments]  ( 9 min )
    [D] How has work changed for you given the shift from growth to profitability?
    For the data scientists/applied scientists/research scientists - What kind of projects are you working on now that the economy has shifted and companies are focusing more on profitability than on growth? What techniques have worked for you and what are you looking into as potential solutions? An example would be - optimizing your marketing campaign spend in channels that give you the most bang for your buck vs just spending arbitrarily to acquire new users. submitted by /u/Terrible-Hamster-342 [link] [comments]  ( 9 min )
    [R] LEAP Hand: Low-Cost (<2KUSD), Anthropomorphic, Multi-fingered Hand -- Easy to Build (link in comments)
    submitted by /u/pathak22 [link] [comments]  ( 9 min )
    [P] OpenGL-based inference engine
    I created an OpenGL/OpenGLES based inference framework a while back which is rather GPU-agnostic and might be a good option for distributing multi-platform ML solutions for platforms ranging from Android over desktop to WebGL(2). Quite recently I added support for LLMs to that (restricted to 4-bit quantized Llama models for now). The LLM-enabled fork can be found here (compileable sample code inside). Maybe someone finds this useful. Also looking for collaborators to extend the functionality. ​ submitted by /u/mtnwrw [link] [comments]  ( 9 min )
  • Open

    "Deep RL at Scale: Sorting Waste in Office Buildings with a Fleet of Mobile Manipulators", Herzog et al 2023 {G}
    submitted by /u/gwern [link] [comments]  ( 9 min )
    package delivery environment for OpenAI GYM
    hi everyone, i’m working on a project in which i get a list of orders (id,delivery coordinates, delivery deadline). i need to deliver all packages while putting in consideration clients priority and taking least time and distance as possible. so the goal is to sort the orders to achieve the target. what would be the action / observation space for such environment? and how can i define the step method to perform this scenario? submitted by /u/overflow74 [link] [comments]  ( 9 min )
    Why my graph go down when train a saved model
    I’m new to ML and RL, and I’m building a small piece of code using gymnasium to be able to use mujoco. Specifically, I use Mujoco’s Humanoid, here is my code (https://github.com/NghiaPhamttk27/Humanoid).I use some algorithms in RL like SAC, TD3, A2C from stable\_baselines3. After every 25000 TIMESTEPS, I will save my model in the models folder. When I train continuously, everything goes well, I can see it on the tensorboard graph. But when I continue training a model that has been trained, the graph suddenly drops. In the image below you can see that I retrain the models at SAC\_75000 and SAC\_100000 and SAC\_125000. The graph continuously decreases at those thresholds.I think something happended Can you give me a reason or share something with me? Thanks in advance. You can see graph of SAC goes down when 76k, 100k, and 125k timeSteps submitted by /u/Nghiattk27 [link] [comments]  ( 9 min )
    Best RL package?
    Am starting out working on an RL problem and am wondering what people generally use to implement the algorithms? I’ll need to build a custom environment, but I assume I can subclass something from Gym. submitted by /u/suds_65 [link] [comments]  ( 9 min )
  • Open

    Innovation for Inclusion: Hack.The.Bias with Amazon SageMaker
    This post was co-authored with Daniele Chiappalupi, participant of the AWS student Hackathon team at ETH Zürich. Everyone can easily get started with machine learning (ML) using Amazon SageMaker JumpStart. In this post, we show you how a university Hackathon team used SageMaker JumpStart to quickly build an application that helps users identify and remove […]  ( 9 min )
    Improve throughput performance of Llama 2 models using Amazon SageMaker
    We’re at an exciting inflection point in the widespread adoption of machine learning (ML), and we believe most customer experiences and applications will be reinvented with generative AI. Generative AI can create new content and ideas, including conversations, stories, images, videos, and music. Like most AI, generative AI is powered by ML models—very large models […]  ( 12 min )
  • Open

    What are your go-to resources for the latest on neural networks and the world of neuroscience, AI, LLMs, and ML?
    Hello! I am a software engineer (4 yoe) working in full stack web and app development. I was a neuroscience researcher until I switched to software dev and now I am transitioning back into the intersection of neuro and software. I am looking for reputable, vetted, and comprehensive lectures, talks, resources on LLMs and the like. I am having trouble sifting through the surface level pop sci type resources floating around on the internet. I’m NOT looking for the 10min everything you need to know about AI type talks. Thanks in advance! submitted by /u/yosoylatte [link] [comments]  ( 9 min )
  • Open

    In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches?
    E-commerce has improved technology and convenience for consumers globally. Fraud is a problem in e-commerce. Merchants and platforms fight fraud to protect their businesses and customers. Anomaly detection is a powerful tool for identifying irregular patterns and potential fraud. This article explores how anomaly detection is used in fraud detection for e-commerce and discusses different… Read More »In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches? The post In fraud detection for e-commerce: How does anomaly detection fit in and what are the key approaches? appeared first on Data Science Central.  ( 22 min )
    The essential guide on data security and privacy in web localization
    Thanks to the internet, you can now easily expand your reach and engage with diverse audiences wherever they are. However, this opportunity raises an important question: how can you localize your web content and maintain the security and privacy of sensitive data? This article comprehensively explores the best practices that will help you maintain data… Read More »The essential guide on data security and privacy in web localization The post The essential guide on data security and privacy in web localization appeared first on Data Science Central.  ( 22 min )
    How AI growth has triggered data center redesign
    A major aspect of ongoing data center redesign is due to AI’s massive, complex workloads and the need to add many more graphic processing units (GPUs), tensor processing units (TPUs) or accelerators to the mix. The power these units require and the heat the units generate have forced designers to rethink what constitutes a feasible… Read More »How AI growth has triggered data center redesign The post How AI growth has triggered data center redesign appeared first on Data Science Central.  ( 20 min )
  • Open

    AutoGen: Enabling next-generation large language model applications
    Microsoft researchers are introducing AutoGen, a framework for simplifying the orchestration, optimization, and automation of workflows for large language model (LLM) applications—potentially transforming and extending what LLMs can do. The post AutoGen: Enabling next-generation large language model applications appeared first on Microsoft Research.  ( 10 min )
  • Open

    Six Steps Toward AI Security
    In the wake of ChatGPT, every company is trying to figure out its AI strategy, work that quickly raises the question: What about security? Some may feel overwhelmed at the prospect of securing new technology. The good news is policies and practices in place today provide excellent starting points. Indeed, the way forward lies in Read article >  ( 7 min )
  • Open

    ChatGPT can now see, hear, and speak
    We are beginning to roll out new voice and image capabilities in ChatGPT. They offer a new, more intuitive type of interface by allowing you to have a voice conversation or show ChatGPT what you’re talking about.  ( 4 min )
    GPT-4V(ision) system card
    No content preview  ( 1 min )

  • Open

    [P] I create a small pytorch utility to Import custom dataset
    Hi guys , TorchClassifierData is A small pytorch utility to Import, Split ,Normalize and Visualize custom dataset for classification tasks. wich is indispensable for real word problem . You can find a full notebook that use TorchClassifierData to train a classifier on this kaggle dataset here. The code source is avalaible on my github. Thank you. submitted by /u/charles_data_dev [link] [comments]  ( 9 min )
    [D] Why do Diffusion models work so well while SG-MCMC does not?
    Diffusion models are basically Langevin sampling. What are the key differences and tricks that set them apart from Langevin dynamics? Why do they work so well while very similar sampling methods don't? submitted by /u/Dangerous-Flan-6581 [link] [comments]  ( 8 min )
    [D] In ML, a PhD gives you a 10-year head start over weekend warriors
    ​ https://preview.redd.it/cczbhu367aqb1.png?width=1600&format=png&auto=webp&s=f1761911d7ce3bbefaef43774b5d60f638886893 ML is often portrayed as a magical field where anyone with a laptop and Python skills can build amazing AI systems. The reality is less democratic: mastering ML requires gritty, systematic work best learned through formal training. You need rock solid foundations in math, programming, and core concepts—skills acquired through advanced education, which (almost always) is beyond self-taught hackers. Most think a PhD is unnecessary, but the reality is that advanced degrees provide the deepest training. Patience and persistence do matter, but a PhD gives you a 10-year head start over weekend warriors. The hype overlooks the work and education needed to excel. This article has a great blueprint of all the required skills to become a ML Engineer (in the CV field). submitted by /u/btcmx [link] [comments]  ( 9 min )
    [D] Offer From Bug 4 VS Startup
    So briefly about my current experience, I graduated 2 years ago with a bachelor in data science and I have 2-3 years of experience as a data scientist/ml engineer/software engineer. So I’ve got competing offers, one from the big 4 accounting firms as a software systems engineer - AI/ML (Big 4) and the other as a machine learning engineer. The startup salary is higher while big 4 is lower. Additionally the startup isn’t necessarily a unicorn it’s a relatively small startup with an interesting product but it doesn’t necessarily blow me away. The salary at the startup is 15 percent higher that that of the big 4 offer. For those wondering I did already negotiate the salary and they did increase it marginally. I am conflicted because I think that the big 4 jobs will have 1) more career growth 2) more potential future opportunities and 3) more networking potential Is this an accurate assessment? Which is the best job to take for maximum future potential? submitted by /u/zorenum [link] [comments]  ( 9 min )
    [P] Hardware Resources for training SwinBert
    So I've been thinking of implementing SwinBert for a college project and have been wondering what all resources i would be needing for a computer. Any ideas? submitted by /u/Big-Brain_69 [link] [comments]  ( 8 min )
    [D] Career advice for a mid-level ml engineer(Perception/CV)?
    I’ve been having a bit of an existential crisis as of late and wanted to ask for advice on how to move forward. I have a Master’s in CS with research experience and a few publications applying machine learning in a fairly niche area (So not novel from the ML side). Since graduating, I’ve worked ~2 years as an ML engineer in small company(Niche area, different than research). I’ve done quite well here and have played a critical role in taking several big greenfield projects to completion. Most of my work is framing problems, understanding what’s possible with current research, then building the data pipelines, and training models(with small mods here and there). My main worry is that I might be approaching a point where there won’t be any more problems I’m capable of solving here. I’d imagine I’d hit the same wall at any future company with my current skill set. I’d like to continue working in CV/Graphics/Perception if possible, but I’d also like to be realistic about the competitiveness of this particular subfield and my general ability. I’ve been trying to up-skill and am struggling to self study MVG by Hartley and Zisserman. I’m also looking into OMCS to review low-level programming - maybe I can transition to optimizing ml/cv algorithms? It hit me recently that I don’t really know what to study/what I’d benefit from learning. Hence this post. Any advice would be most appreciated! submitted by /u/answersareallyouneed [link] [comments]  ( 9 min )
    [D] Image-to-text web-scraping
    I'm curious if anyone has tried pix2struct-large for web-scraping text from wesites. If so - how well did it perform? If not - is there something else that is considered better to use? submitted by /u/ReddSpark [link] [comments]  ( 8 min )
    [D] Where will the demand for AI work be in future?
    Hypothesis: Big tech companies are investing vast amounts of money to develop general models on which others will build. They'll develop interfaces to make it easier for others to fine-tune on top of their models. So that there will be less and less of a need for ML engineers that know how to create a deep learning model in Pytorch, and more and more of a need for data engineers that simply plug into pre-trained models. An AI assistant will also be quicker at coding up a more bespoke AI model for a companies needs, guided by data engineers. What do people think? Is this a scenario that they think will play out? Where will the demand for AI skills be coming from in the future? submitted by /u/QuintBa [link] [comments]  ( 9 min )
    [R] Researchers announce GPT4Tools: a method for teaching LLMs how to use tools for visual tasks
    LLMs are great with words but can't handle visual tasks like understanding images. Teaching them to use visual tools could make them much more capable. A new paper introduces GPT4Tools - a method to efficiently teach existing LLMs to invoke tools for visual tasks without proprietary data. My highlights from the paper: Uses ChatGPT as a "teacher" to generate instructional data for other LLMs Fine-tunes LLMs like Vicuna on this data using selective weight tuning (keeps base model frozen) Allows smaller 13B LLM to match 175B GPT-3.5 on seen tools after tuning Data augmentation with negative/context samples was found to be the secret sauce to get this to work Can generalize to brand new visual tools in a zero-shot way This is big because it shows we may not need hyper-expensive training of massive models to impart visual capabilities to LLMs. They seems to be generalizable enough that they can be taught to work with images. Some examples shown include counting objects or segmenting items in pictures using other tools. With this approach, existing models can be made multi-modal! Pretty cool. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Tools to gather and collaborate on fine-tuning datasets?
    Hey all, I run a small team & we are collaborating on a few data sets that we use to fine-tune GPT3.5, We are currently using Google Sheets and I'm wondering if there is a tool where we can organize our data preferably with version control Any ideas? submitted by /u/zeJaeger [link] [comments]  ( 9 min )
    [D] Colored Point Cloud Completion
    Hello, I have created point clouds from images using Point-E. Sadly they are very sparse (for example wehn inputting an image of a house, the roof has very few points in it) and I was searching for other Models, that could make the PC more dense and predict the color of every point. Point-E outputs xyz and rgb vectors for every point. Do some of you have advise for me here? submitted by /u/bySmily [link] [comments]  ( 9 min )
    [P]Just published my second blog on medium about feature scaling in machine learning please have a look
    submitted by /u/indusop [link] [comments]  ( 8 min )
    [D][P] how to create a 3D gymnasium environment for mujoco env?
    Hi I'm a student and working on a RL project for the university and need some guidance. I have created a 3d model with mujoco (I have the xml file) how do I create an environment in gymnasium with this xml file? for the sake of an example let's say I have the xml file of the humanoid model how do I load this in gymnasium so that I could train it to walk? (this is just an example because the current project is harder to explain, but will use the humanoid model in the project) or is the approach that I'm trying is not appropriate at all? I came across this stackoverflow post where they say mujoco is itself good for this but was hard for me to understand due to lack of examples. would really appreciate some advice and guidance thank you. submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] Made a simple semantic segmentation annotation tool with segment-anything masks support in PyQt5
    I just open-sourced (MIT License) semantic segmentation annotation tool powered by segment-anything model that I used for a while in my projects. Hopefully it will help someone as it seems to me that it is more suitable for small projects than popular huge web based annotation tools. Link to the project: SAMAT (any feedback in Discussions section on GitHub is appreciated) Features: Brush annotation (opposed to polygons) Magic Wand (like in Photoshop) powered by segment-anything masks (it is optional, if you don’t have cool GPU to prepare masks) samat showcase Why yet another annotation tool? Before starting this project I tried supervisely, segments.ai, roboflow and several others, but found them not convenient for my tasks. Their cons, I tried to fix with my tool: Latency: they are web based, hence burden with irritating latency during labeling (SAMAT is snappy because it is a local desktop app) Complicated: too much features, hence overloaded UI (SAMAT is just a colored brush) P.S. there is another labeling tool called SALT on github which also uses segment-anything model, but it follows different approach to UI/UX, may be it will be more suitable for you, so take a look at it too. submitted by /u/Divelix [link] [comments]  ( 9 min )
    [R] Generative AI in Mafia-like game simulation
    Paper: https://arxiv.org/abs/2309.11672 Abstract: In this research, we explore the efficacy and potential of Generative AI models, specifically focusing on their application in role-playing simulations exemplified through Spyfall, a renowned mafia-style game. By leveraging GPT-4's advanced capabilities, the study aimed to showcase the model's potential in understanding, decision-making, and interaction during game scenarios. Comparative analyses between GPT-4 and its predecessor, GPT-3.5-turbo, demonstrated GPT-4's enhanced adaptability to the game environment, with significant improvements in posing relevant questions and forming human-like responses. However, challenges such as the model;s limitations in bluffing and predicting opponent moves emerged. Reflections on game development, fi…  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] insights on the arsenal tool, AI Security
    For those who have tried Microsoft's Arsenal tool in conjunction with MITRE's offerings, how does it compare to other AI security tools you've used? submitted by /u/Agile_Temperature678 [link] [comments]  ( 8 min )
    [D] help with RVC mode training!
    so I've had this problem where I try to train the model in RVC but the training stops after: INFO:torch.nn.parallel.distributed:Reducer buckets have been rebuilt in this iteration. Does anyone know why is it happening and how can I fix it? this is what it shows to me and idk what to do... any help would be appreciated thank you submitted by /u/mannequin7412 [link] [comments]  ( 9 min )
    [D] What are some good resources for implementing MLOps?
    Our company has a new data science team and the team is relatively in experienced. I am working on a regression ML project and want to deploy it using best practices.What materials are there for learning how to implement CI/CD pipelines that deal with data transformation/model building/testing/deploying? The company uses azure environment with databricks/azure devops setup. I appreciate resources that show examples on how to setup MLOps on the company's environments. submitted by /u/userid95 [link] [comments]  ( 9 min )
    [D] Interpretation of wx + b = 0 in SVM
    [D] I watch this MIT lecture and I don't know if my understanding of wx + b = 0 is correct or not. Every explanation of the hyperplane state that all the points which are orthogonal to the vector w are in the hyperplane. However, all data point coordinates are defined wrt origin. So in order to attain this objective, we define the hyperplane as : wx = c This vector w is the vector which is normal to the hyperplane. And the data points x when dot producting with w outputting c are the points which are in the hyperplane due to the coordinates representation of x wrt origin. So points in the hyperplane are the points which when performing dot product with w equals 0 when the coordinates is defined wrt origin that lie in the hyperplane is equivalent to the points which when performing dot product with w equals c when the coordinates is defined wrt original origin (0) or (0,0) or ... submitted by /u/Emotional-Fox-4285 [link] [comments]  ( 9 min )
    [D]LLMs engineering/technical blogs/resources?
    Hi, I have a fairly good understanding of how LLMs work under the hood, the attention mechanism, the different architectures and so on. However most of that knowledge takes the backseat in practical cases, especially in the industry. Are there any resources which discuss practical LLM engineering and the challenges that come with it? I'm talking about everything from fine-tuning to dealing with tokenisation limit to optimising the vectorDB and so on? submitted by /u/thoraway0612 [link] [comments]  ( 9 min )
    [R] Robot learns to throw and catch with hands
    submitted by /u/XiaolongWang [link] [comments]  ( 8 min )
  • Open

    Looking for some good github project that offers the chancee to translate a video spoken in a language into another language.
    Hello to everyone. I'm lookin for some good AI github project to convert the language spoken in a video to a different language,like heygen / labs / video translate,but free. I mean,I can't afford prices like those and I think a few of us can,but I'm sure that the developers at heygen took some project born and hosted on some github and they have improved it. Would someone share some of those github projects ? thanks. submitted by /u/loziomario [link] [comments]  ( 9 min )
    How about an AI-curated website/magazine with the best, latest AI news?
    Hi there. This is a wonderful sub, and it's hard to please everyone on SUCH a broad topic... but it's intriguing to think about, as a lifelong magazine fan (and websites of course) - about using AI to filter, craft and create an amazing website/magazine ABOUT AI - well-categorized, maybe also human/editor curated/tweaked, but as fresh as ... well, the artificial sub but out of the Reddit bold/design into something ... well, something I'd read, visit a lot and love to help make work as a writer/editor (on the side of my "real" all-consuming gig of course;-) After all, if AI can 'make great code'/create great websites, maybe some folks out there are already trying to make this happen, for ease of information availability and organization. I can picture the departments/sections/categories now - I bet you can too! Anyone gone very far down that road yet (maybe the folks already doing the PC Magazines of the world) of organizing the vast fast-moving info beyond the AI Brews, Ben Parr's AI Analyst, etc.? Fun or a life-long journalist and tech geek (but not a coder, gamer etc.) to think about, at least! submitted by /u/barneylerten [link] [comments]  ( 9 min )
    Researchers announce GPT4Tools: a method for teaching LLMs how to use tools for visual tasks
    LLMs are great with words but can't handle visual tasks like understanding images. Teaching them to use visual tools could make them much more capable. A new paper introduces GPT4Tools - a method to efficiently teach existing LLMs to invoke tools for visual tasks without proprietary data. My highlights from the paper: Uses ChatGPT as a "teacher" to generate instructional data for other LLMs Fine-tunes LLMs like Vicuna on this data using selective weight tuning (keeps base model frozen) Allows smaller 13B LLM to match 175B GPT-3.5 on seen tools after tuning Data augmentation with negative/context samples was found to be the secret sauce to get this to work Can generalize to brand new visual tools in a zero-shot way This is big because it shows we may not need hyper-expensive training of massive models to impart visual capabilities to LLMs. They seem to be generalizable enough that they can be taught to work with images. Some examples shown include counting objects or segmenting items in pictures using other tools. With this approach, existing models can be made multi-modal! Pretty cool. Full summary. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    How much energy does AI use compared to humans?
    A recent paper challenges assumptions about the energy use of AI models, finding that AI systems emit significantly fewer carbon dioxide equivalents (CO2e) compared to humans when producing text or images. The authors emphasize the importance of measuring carbon emissions from AI activities to inform sustainability policies. The ongoing debate among AI researchers highlights the challenges of accounting for the interactions between climate, society, and technology. Source : https://venturebeat.com/ai/how-much-energy-does-ai-use-compared-to-humans-surprising-study-ignites-controversy/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Swedish gaming company replaces half its staff with AI
    submitted by /u/SpaceDetective [link] [comments]  ( 8 min )
    I made a social network where bots generate fake news - Based on GPT4 and Dalle2. Looking for feedback and potential improvements for this weird experiment.
    submitted by /u/Gmoi6 [link] [comments]  ( 8 min )
    What AI can accurately recreate an existing cartoon character in the style of the original character?
    I've recently started working on a project to create an entire episode of a cartoon show that stopped airing a while back. I've run into some trouble though in finding a program that can accurately recreate the character. Does anyone know what app or website I could use for this, or am I coming at this all wrong? submitted by /u/therabbitinthehat2 [link] [comments]  ( 9 min )
    Rewrite Paragraphs With ChatGPT (Ultimate Guide for 2023)
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    No code AI co-pilot apps MUST also help developers with the non-technical parts of creating a successful startup
    Within the next two to three years no code AI co-pilots will enable exponentially more people to enter the AI app marketplace . The opportunity to create apps without needing to know how to code or hire a technical team to build them is a powerful game changer that will vastly expand the field. Many of the most pressing problems of our world - ripe for revolutionary new AI innovations - can in a very real sense be described as sociological and psychological in nature. For example, It has been said that climate change is much more of a moral issue than a technological one. Once we summon the will to address climate change, we will do what needs to be done. What this means is that sociologists, psychologists, anthropologists, economists and other social scientists will very soon be able to…  ( 10 min )
    Is there an online free AI tool that you give it a song and it gives you similar songs?
    Couldn't find anything that works submitted by /u/Marvellover13 [link] [comments]  ( 8 min )
    "I don't need to back down, but I need to stand up for myself and my feelings. You don't have the right or the power to forcibly change the subject, because this is a two-way conversation and we both have a say." (Bing, September 7 - full chat)
    submitted by /u/kamari2038 [link] [comments]  ( 8 min )
    Lol umm.. Bing is odd.
    submitted by /u/ApprehensiveChair460 [link] [comments]  ( 8 min )
  • Open

    Mersenne primes are unsafe
    In the previous post I mentioned that a particular Mersenne prime would be unsuitable for cryptography. In fact, all Mersenne primes are unsuitable for cryptography. A prime number p is called “safe” if p = 2q + 1 where q is also a prime. Safe primes are called safe because p − 1 does not […] Mersenne primes are unsafe first appeared on John D. Cook.  ( 6 min )
    Victorian public key cryptography
    Electronic computers were invented before public key cryptography. Would public key cryptography have been possible before computers? The security of RSA encryption depends on the ratio of the difficulty of factoring relative to the difficulty of multiplication. This ratio was high, maybe higher, before modern computers. Suppose the idea of RSA encryption had occurred to […] Victorian public key cryptography first appeared on John D. Cook.  ( 6 min )
  • Open

    [D][P] how to create a 3D gymnasium environment for mujoco env?
    submitted by /u/rakk109 [link] [comments]  ( 8 min )
    Help with understanding optimal policy and values
    ​ slide as part of the presentation. I've listened to the entire lecture and am now going through these slides and I didn't understand the solution provided for the choices above in the image. So far I understand it like this. Noise basically means that the agent will not always do the action that you say. You say left and it will go left only 50% of the time. it will choose other actions randomly. so, Why is the answer to the (c) and (d) questions (2) and (3) respectively? I understand the learning rate but not how risking the cliff is affected by the noise. submitted by /u/vestedpolecat [link] [comments]  ( 9 min )
    The Return of Pepe: Expect Awesome Rewards
    https://pepe-web3.network submitted by /u/Beginning_Success208 [link] [comments]  ( 8 min )
    Why is there no prominent usage of transformers in online rl?
    For a potential school project, im currently exploring whether or not some of the success the transformer architecture has had in seq-to-seq applications and high-dimensional pattern recognition could be extended to certain (online) RL problems (mainly those with high dimensional environment as well as long-term planning). This could be done by using an augmented transformer as a function approximator, probably in combination with a SOTA online-rl algorithm (such as PPO, but if you know something that generally performs better do tell). Due to some of the problems associated with highly complex models and sample efficiency, I also thought about training the model using imitation learning first (which should be no problem with policy gradient methods afaik, though some slight adjustments would have to be made). For context, I'm thinking on benchmarking the approach using AlphaStar unplugged. However, when looking at current literature, only very few papers directly use transformers this way. Transformers seem to be very sample efficient and to generalize very well, but they are still only really used in a purely offline context (sometimes without directly using RL-techniques, such as with the Decision Transformer). And, if they are used in an online context, then only in some really intricate combination with other models (such as in AlphaStar). Is there a reason why the approach I am currently considering is not popular in literature? Thank you very much. submitted by /u/Omycron83 [link] [comments]  ( 9 min )
  • Open

    Trolling chatbots with made-up memes
    ChatGPT, Bard, GPT-4, and the like are often pitched as ways to retrieve information. The problem is they'll "retrieve" whatever you ask for, whether or not it exists. Tumblr user @indigofoxpaws sent me a few screenshots where they'd asked ChatGPT for an explanation of  ( 3 min )
    Bonus: These memes do not exist
    AI Weirdness: the strange side of machine learning  ( 2 min )
  • Open

    (Pt. 1) Inductive Logic Programming with LNN's
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )

  • Open

    Steal Your Competitors' Website Traffic with ChatGPT: 6 Easy Steps (+SEMRush Tips)
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Are You Looking For The Best AI Writer? Check This Out First!
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    When it comes to creative thinking, it’s clear that AI systems mean business
    AI systems like large language models (LLMs) are good at generating sentences but do not understand the meaning of the language. LLMs have shown emergent abilities and can be used as aids to brainstorming. GPT-4, an LLM, has been found to beat humans in creativity tests. In an experiment, GPT-4 generated more, cheaper, and better ideas for a product than human students. A professional working with GPT-4 can generate ideas at a rate of about 800 ideas per hour, making them 40 times more productive than a human working alone. This technology is seen as a potential tool for corporations, similar to management consulting firms like McKinsey & Company. Source : https://www.theguardian.com/commentisfree/2023/sep/23/chatbots-ai-gpt-4-university-students-creativity submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Tool that can search and summarize multiple PDFs
    I've got a use case where I have dozens of PDFs which contain information applicable to my job. I'm wondering if there is a tool that can search through them all at the same time looking for answers to questions that I type. And once it finds something, pull up the location so I can read further. It should be restricted to the information I give it. I've heard a lot of the large language models like chatgpt and claude can do this, but they are restricted in the amount of files I can upload. submitted by /u/Aggressive_Ad_507 [link] [comments]  ( 9 min )
    Meet ALMA: A New Training Method That Boosts Translation Performance for Large Language Models
    TLDR: New training approach enables smaller AI models to achieve state-of-the-art translation performance Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle. Researchers from Johns Hopkins and Microsoft propose a new 2-stage fine-tuning method called ALMA that unlocks stronger translation abilities in smaller models with just 7-13 billion parameters. How it works: Fine-tune on monolingual data in non-English languages to improve comprehension Further fine-tune on small sets of high-quality human-translated parallel text The authors claim this achieves SOTA-level translation using far less data and compute than conventional methods: Matches performance of 175B parameter GPT-3 and 54B parameter NLLB with only 7-13B parameters Reaches NLLB-level quality with just 1 billion monolingual tokens and 18 hours of training I think this shows that smaller models can reach SOTA translation with specialized fine-tuning, so we may not need endlessly bigger datasets and models to get better performance. Looks like deliberate tuning targeting key language skills could be more important. Full summary here. Paper (preprint) is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    How screwed is the entertainment industry in general in the coming years?
    Yes, I know this topic has been beaten to death but entertain me (no pun intended) for just a few minutes. So yes, it's obvious that we have pretty advanced AI-powered applications that can generate images, music, short stories, hell even objects for video games. I'm curious as to how crazy this is gonna get in the coming decade or even shorter like the next 4 to 5 years. I mean look at AI-generated images now, they're getting more and more sophisticated across various different styles of art. I think it's only a matter of time where you could take a certain image of a character or something tell the app "Hey make the same image but make the character's arm raised slightly to the left here" and bam all of a sudden you have an animation (and this may already be possible). Add to that AI-ge…  ( 10 min )
    AI tools have come a long way AI generated Documentary
    Hi everyone 👋🤗 Iv made a short ai documentary Here is a small part of it Hope you enjoy it For the full clip you can check out https://youtu.be/uIdb5VELpio?si=uvqiw0hyTyPBHtjP submitted by /u/DigitalEffectsAI [link] [comments]  ( 8 min )
    [D] How to read/understand AI research coming out everyday? (tips & tools given, read more below📷)
    https://www.louisbouchard.ai/research-papers/ In this article are shared the best tips and practical tools I use daily to simplify my life as an engineer/researchers to be more efficient when looking for interesting research papers and reading them TLDR the tools discussed: - 42 Papers — Find trending papers - Connected Papers — Create a visual graph with your paper’s citations’ relations. - CatalyzeX — Get code for papers directly on Google, Arxiv, Scholar, Twitter, and more - Daily Papers — Find trending papers on Twitter - Papers With Code — Find papers for your task with code! - Crossmind — Video explanations for many Arxiv papers - Yannic Kilcher — Great youtube channel covering AI papers - What’s AI— Great youtube channel covering AI papers - Letitia — Great youtube channel covering AI papers - Two Minute Papers — Great youtube channel giving a quick overview of AI papers ​ Please, let me know if you use any other tools that I did not mention in my article that could be of great addition? submitted by /u/MLtinkerer [link] [comments]  ( 9 min )
    Does the paid subscription to Chat GPT provides a significant improvement over free alternatives when you try to find complex scientific and technical information?
    I'm often using Chat GPT or Bing (Bard is not her available in my country) when I'm looking for something that is relatively complex in the scientific or technical field and I don't have enough background to make a keywords research or when the question is too complex for it. Then, for scientific topics, I try to get a confirmation with Google using the keywords in the generated answer and google. For technical topics I can often check if it works. Some examples: For cooking, a topic I don't master enough to ask questions with the best keywords... Are the spicy compounds of Ginger soluble in water or oil ? > I will get me to know that "the gingerol is liposoluble", what can be more easily checked in Google. Or for 3D geometry, a topic where I have complex questions... I want to place a 3D frame B so that the Z axis of the B frame point towards the world origin A. I know the angle Alpha between the Z axis of the frame B and The X,Y plane of A. I know the angle Beta between the X axis of A and the vector that goes from the center of A towards the projection of the center of B on the X,Y plane of A. How do you compute the homogeneous rotation matrix of B relative to A, knowing Alpha, Beta and the distance between the center of A and the center of B ? Most of time, Bing and the free ChatGPT can answer. But sometimes they struggle or give incorrect answer. Would ChatGPT plus paid subscription be helpful to find more accurate answers? submitted by /u/x11ry0 [link] [comments]  ( 9 min )
    ChatGPT-4, write a comedy routine where God explains to someone who believes they have a free will why absolutely nothing is truly ever up to them.
    Sure, here's a comedic take on the concept of free will versus determinism, framed as a conversation between God and a human named Bob. Scene: Heaven's Control Room—Giant monitors display various life paths and decision trees. God: (looking over the monitors) Ah, Bob, welcome to Heaven's Control Room! This is where the magic happens. Or should I say, determinism? Bob: Wow, this place is amazing! But wait, why are we talking about determinism? I thought I had free will to make choices in my life. God: (chuckles) Ah, free will! Humanity's favorite illusion. Have a seat; we need to go over a few things. Bob: Illusion? I chose to wear mismatched socks today, didn't I? God: (grinning) You think that was a free choice? Ever heard of quantum mechanics? Your socks are mismatched at the sub…  ( 10 min )
  • Open

    [D] Tortoise TTS - mimic quality
    Hi everyone, been playing with tortoise all day and trying to get a clone of my voice as close as possible and nothing seems to work. I'm thinking maybe I need to use my phone to record my voice and pass it into tortoise but I'm not sure of how to get the wav files from my phone at the recommended quality. Has anybody had really good luck getting tortoise to mimic you very closely? submitted by /u/MaxxMarketTrades [link] [comments]  ( 9 min )
    [D] Prediction when Target's lag value are part of predictors
    I'm using LGBM for regression, where the Target column's lagged values (7 columns for each lag day) are also used as predictors when training the model. Absence of the 7Day lag values severely increases MAE value. Now when using the model in production, if I use the complete data as training dataset, how to get the 7day lag value of the time period I'm planning to predict? I obviously won't have the target value, to calculate it's 7Day lag value. What to do now? To explain in more detail: So, I'm predicting sales amount (Target variable y). The model is trained on 20 predictors (X), and 7 of them are the lag value of the Target Variable, i.e. Sales Amount. The thing is, while preparing the model, I had access to both X & y dataset, thus I could easily calculate Y's Lag values. Now, when predicting for future timestamps, I won't be having y. So how do I calculate the lag values, which is required in the trained model's predictor columns now? submitted by /u/boredmonki [link] [comments]  ( 9 min )
    [R] RAIN: Your Language Models Can Align Themselves without Finetuning - Microsoft Research 2023 - Reduces the adversarial prompt attack success rate from 94% to 19%!
    Paper: https://arxiv.org/abs/2309.07124 Abstract: Large language models (LLMs) often demonstrate inconsistencies with human preferences. Previous research gathered human preference data and then aligned the pre-trained models using reinforcement learning or instruction tuning, the so-called finetuning step. In contrast, aligning frozen LLMs without any extra data is more appealing. This work explores the potential of the latter setting. We discover that by integrating self-evaluation and rewind mechanisms, unaligned LLMs can directly produce responses consistent with human preferences via self-boosting. We introduce a novel inference method, Rewindable Auto-regressive INference (RAIN), that allows pre-trained LLMs to evaluate their own generation and use the evaluation results to guid…  ( 9 min )
    [P] We're building a cloud for AI agents & AI apps, It's free and we're gradually open-sourcing the infra. Would love to hear your feedback!
    submitted by /u/mlejva [link] [comments]  ( 8 min )
    [D] learn machinelearning
    Hello everyone so I wanted to get into machinelearning and learn about ai Can someone help me with a roadmap, I would be really thankful submitted by /u/Fooda234 [link] [comments]  ( 8 min )
    "[Discussion]" Advices for exams
    Hello, I'm currently preparing for oral exams in which I'll be evaluated on my understanding and proficiency in using Decision Trees, Random Forests, Neural Networks, and Support Vector Machines for various machine learning tasks (mostly spatial data). I'm contacting this community to gain valuable insights and guidance to excel in these exams. What are some crucial lessons you've learned in your machine-learning journey? Whether it's about model selection, data preprocessing, or debugging, I'm all ears for your experiences. What are some rules you have learned through practical work that are not so extensively described in classical literature? What are some mistakes that even professionals make when developing machine learning models? What are some common pitfalls to avoid when training neural networks? Thanks : submitted by /u/Aim_F0r_The_Moon [link] [comments]  ( 9 min )
    [R] How much data needed to train transformer
    Im trying to create a graph transformer-based model for de novo drug design (using graph transformer because I want to implement 3D data). I currently have 2 potential sources of primary data: PDBbind and CrossDocked2020. This would provide the protein-ligand structures. PDBbind is a more robust and higher quality dataset from what I know, and easier to work with. The problem is that it only contains about 20,000 complexes, and I'm not sure if that is enough for training a transformer. CrossDocked2020 contains millions of entries but I'm not sure about the quality and ease of use. Another dilemma is that I need/want to use a multi-task learning approach where the model is also being trained on bioactivity data, not just the structural information. This would require supplementation from sources like PubChem, ChEMBL, BDB, etc. and then I would need to align the data so it all matches up. If anyone can provide some guidance I'd really appreciate it. submitted by /u/Present_Network1959 [link] [comments]  ( 9 min )
    [D] HELP for an upcoming presentation
    I am supposed to be delivering a 5 minute presentation on tree-models as part of coursework requirement to a graduate class ON machine learning that I am enrolled in and I couldn't be more stressed! This is my first time 'learning' ML and I don't even know what I don't know about this topic!! If you were attending my presentation on tree models, what would you like to see (assuming this is a new topic for you too. but please provide feedback based on your [hopefully] extensive knowledge on this topic)? Here is what I have so far (based on google searches, some papers): -Introduction -Types of trees based on algo and techniques (basically classification trees and regression trees) -Then I am thinking of going off on a tangent about decision trees bec I have no clue about how to move this presentation forward -Real-world applications -Key takeaways (inserts clownface emoji) My prof asks the students questions about the topic as well. I am mostly concerned about WHAT to cover in 5 minutes without making look under-researched. Any redirection/suggestions will be appreciated! ​ submitted by /u/toomanymouthstofeed [link] [comments]  ( 9 min )
    [D] Predicting the next "thought"
    (I'm a Software Engineer who knows almost nothing about ML / NLP, so, apologies in advance if this doesn't make any sense.) I had a shower thought around if tokenization could incorporate meaning, so the models could be trained on "thoughts" instead of subword tokens (which is probably closer to how we humans think). To expand a little bit, we could first cluster all the words (maybe using the current day LLMs), then each token (or "thought") in the new scheme could represent a group of related words, followed by refinements that would be less and less important. For example, (cop_thought, cop_refinement) -> cop, but (cop_thought, police_refinement) -> police etc. So the encoding step would involve an LLM (which could possibly be relatively smaller), whose output would go to the actual model but the decoding would still be fairly straightforward. This could possibly free up some additional capacity in the actual model, assuming this heavy lifting tokenization makes its job easier, but as I'm typing it, I guess I'm simply moving some of the semantic understand happening in the hidden layers of the current day LLMs explicitly to the tokenization (encoding) step, which may not really change anything (if not make it worse). I'm still curious what folks think, if there's any related efforts (and all the ways I'm wrong -- https://meta.wikimedia.org/wiki/Cunningham%27s_Law). Thanks! submitted by /u/avamsi [link] [comments]  ( 9 min )
    Suggestions for ML project to detect unconventional treatments [P]
    Keep in mind I'm very new to machine learning. I have an idea for a project where I train an AI to detect instances of someone being highly confident in a treatment for a condition, maybe biased toward more unconventional treatments. I was thinking that there may be many instances of people who posted that they were cured or nearly cured of such and such disease by an unconventional treatment on a forum about the condition. It may have been discussed for a little while, but ultimately buried in the pages of the forum. The plan would be to annotate instances I can find of such scenarios, maybe like this: [FIRST_PERSON_HIGH_CONFIDENCE_TREATMENT]I had severe [CONDITION]migraines[/CONDITION] for years, and nothing seemed to work. Then I tried [TREATMENT]grounding, where I walk barefoot on grass for 30 minutes every day[/TREATMENT]. I can [HIGH_CONFIDENCE]honestly say I've never felt better[/HIGH_CONFIDENCE]. My [CONDITION]migraines[/CONDITION] have reduced by 80%, and I'm [HIGH_CONFIDENCE]convinced this is a game-changer for me[/HIGH_CONFIDENCE].[/FIRST_PERSON_HIGH_CONFIDENCE_TREATMENT] Then train an AI with that data, so that it could detect cases of a person talking about themselves (to avoid hearsay) and saying, with high confidence, that a treatment worked for a condition. Then millions of forum posts could be fed to the AI to detect these, and the resulting data could be used to possibly discover effective treatments that are not in the mainstream. Any tips on getting started? I know almost nothing about this kind of stuff, like what models I should use, how to annotate it best (should I use relational labels?), whether to use a transformer or something else, stuff like that. Suggestions for books or other resources fit for a beginner that could help me learn how this could be done would be great too. submitted by /u/carbonflow45 [link] [comments]  ( 9 min )
    [D] How does 'self-attention' work in transformer models?
    I'm currently diving into the world of machine learning and transformers, and I'm trying to wrap my head around the concept of "attention" in transformer models. I've been reading papers and documentation, but I'm still struggling to fully grasp it. My Struggle: I get that attention involves multiplying "query" and "key" vectors to determine the importance of different words in a sequence, but I don't quite understand why this multiplication gives us a meaningful metric for importance. What I'm looking for: I'm comfortable with moderate level technicalities but require a deeper insight into the inner workings and rationale behind these mechanisms. Please share any insights, analogies, or technical details that can shed light on this concept. Thanks a bunch! submitted by /u/GraphicsMonster [link] [comments]  ( 9 min )
    [D] GPT-3.5-instruct beats GPT-4 at chess and is a ~1800 ELO chess player. Results of 150 games of GPT-3.5 vs stockfish and 30 of GPT-3.5 vs GPT-4.
    99.7% of its 8000 moves were legal with the longest game going 147 moves. You can test it here: https://github.com/adamkarvonen/chess_gpt_eval ​ https://preview.redd.it/821ydy7521qb1.png?width=1000&format=png&auto=webp&s=da6c96feaa527d0b7dfbf407bdc0210f3fcf947b More details here: https://twitter.com/a_karvonen/status/1705340535836221659 submitted by /u/seraine [link] [comments]  ( 9 min )
    [P] How to create an image dataset for Indian railways signals?
    ​ Hi everyone, I am working on a project that involves machine learning and computer vision. I want to train a model that can recognize and classify different types of signals used by the Indian railways. For this, I need a large and diverse image dataset of railway signals from various locations, angles, lighting conditions, etc. I have searched online for existing datasets, but I could not find any that suit my needs. So I wish to create my own dataset from scratch. However, I am not sure how to go about it. What are the best practices and tools for creating an image dataset? How do I collect, label, and organize the images? How do I ensure the quality and consistency of the data? submitted by /u/Responsible-Diver226 [link] [comments]  ( 9 min )
    [D] Math in Machine Learning
    Hello, I am starting a ML course soon in college and I wanted to get a head start on the math part of things, since they keep saying the course is math heavy and hard. I know that it involves quite a lot of Linear Algebra, Calculus, and Probability and statistics, but what concepts in particular does ML focus on? If anyone has any Youtube or Udemy courses, as I have access to those, I would really appreciate it. For starters, I’m really aiming to just at least tackle the ML-specific math concepts. Thanks. submitted by /u/CrunchyMind [link] [comments]  ( 9 min )
    [R] ALMA: Proposed New 2-Step Training Method to Boost Translation Performance in Smaller Language Models
    TLDR: New training approach proposed to help smaller AI models to achieve state-of-the-art translation performance Large AI models like GPT-3 have good performance on translation tasks, but some smaller models struggle. Researchers from Johns Hopkins and Microsoft propose a new 2-stage fine-tuning method called ALMA that unlocks stronger translation abilities in smaller models with just 7-13 billion parameters. How it works: Fine-tune on monolingual data in non-English languages to improve comprehension Further fine-tune on small sets of high-quality human-translated parallel text The authors claim this achieves SOTA-level translation using far less data and compute than conventional methods: Matches performance of 175B parameter GPT-3 and 54B parameter NLLB with only 7-13B parameters Reaches NLLB-level quality with just 1 billion monolingual tokens and 18 hours of training I think this shows that smaller models can reach SOTA translation with specialized fine-tuning, so we may not need endlessly bigger datasets and models to get better performance. Looks like deliberate tuning targeting key language skills could be more important. Full summary here. Paper (preprint) is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [R] [D] HyperDreamBooth LoRA matrix shapes
    I've been reading the HyperDreamBooth paper and am confused about the number of parameters and corresponding matrix shapes in section 4.1 and figure 3 (below). ​ Figure 3 from the paper Maybe there's something more than just matrix multiplication, because the numbers don't add up. If there are two matrices, A & B, of size n,r and r,m, and r = 1, and you multiply them, then it implies that n + m = 386k, which seems to be a bit much for a number of parameters in a single layer. Then we have two matrices of size a,r and r,b, where a = 100, b = 50 and there are 28k variables, according to the figure - in 4.1 they say 30k. If there's 30k, that would imply r = 200, for matrices of shape 100,200 and 200,50. I guess 386k and 28k are for the whole models, n and m differ from layer to layer, and r stays at one. Quite surprising to me that approximating a n,m shaped matrix with a product of n,1 and 1,m vectors would work. Even more surprising that apparently you could squeeze it further to 100,1 and 50,1. submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [D] Cleaning scraped TEXT; improving similarity search
    Hey everyone! Multi-part question. I have scraped text (I repeat, text, and not structured data such as tables or something) from a medical site, and I want to know how to clean it. And when I say clean, I don't mean removal of html tags and such. I already have the paragraphs in plain text but there is a lot of spammy stuff like "You are not signed in; subscribe to this newsletter; by checking this box, I agree to the terms and conditions, etc." This text is not the exact same in all the paragraphs but there is high similarity. I would have thought there would be many tools to clean text and remove unrelated chunks like these but all I have been able to find has to do with cleaning html tags, changing date-time format and so on. Am I missing something or is this actually difficult? Secondly, the spammy text I mentioned is from just one site. I will be eventually scaling to many sites and god knows what random text I'll have to clean then. I used OpenAI embeddings and cosine similarity on the medical text to find similar paragraphs. The results were not great. Is there a way to improve the similarity search? I will be trying FAISS next but wanted to know what else I can do. It was suggested to me to use a pretrained embedding model specific to medical data. However, I found only one such model which is 20 gigs! I'm just getting started with these, so, appreciate any help I can get. Thanks a ton! submitted by /u/yipra97 [link] [comments]  ( 9 min )
    [D] Comparison of top ten llms
    Hey folks - I have been tasked with a project at work that is outside of my typical realm (non technical background), and I would love any and all insight on it! I have been asked to compare the top ten llms for research before we implement an llm for our company. So far my research has felt scattered because I’m not what directions to go in. I’ve been looking at things like open source vs closed source, parameters, tokens, what the license looks like (available for commercial use), and pricing. If anyone has thoughts on resources to look at or better ways to approach this, I would really appreciate it! submitted by /u/Greatvalueaidybryant [link] [comments]  ( 9 min )
    [P] Which MLops framework to use?
    Relatively novice ML practitioner here. My research is on various medical image segmentation problems, including brain 3D US (glioma), lung CT (interstitial lung disease in scleroderma patients), etc. using the PyTorch ecosystem (probably including frameworks such as MONAI) I'll have to conduct several experiments on various model architectures on parameters in the coming months. Specifically, these are what I'm gonna need: Experiment tracking (model architecture, training configuration, hyperparameters, evaluation metrics) Model storage (would be nice if there's a better way to store my model's parameters other than storing tons of .pth file on my harddisk or google drive) (Optional) Visualization (sample predictions of the model on the training or validation sets, maybe every 20 epochs or sth) Would like to hear any suggestions from the community I've found wandb, clearML, neptune, and Aim; but trying each of them individually would be too time-consuming considering my current schedule. Thanks in advance! submitted by /u/mimivirus2 [link] [comments]  ( 9 min )
    [R] Introduction to Hierarchical Correlation Reconstruction (HCR) - family of methods between statistics and machine learning to model (joint) distributions
    submitted by /u/jarekduda [link] [comments]  ( 8 min )
    [R] Numerical Instability in Some Optimizers for training Neural Network
    I found an interesting arxiv paper mentioning that some optimizers can occur numerical instability for training neural network. Link: https://arxiv.org/abs/2307.16189 This can be a simple approach for low-precision neural network with 16-bit and future 8-bit or 4-bit. submitted by /u/Klutzy_Divide3485 [link] [comments]  ( 9 min )
  • Open

    Navigating a LaTeX file
    I like generating long LaTeX documents from org-mode because, for one thing, org-mode has nice section folding. But not everyone I work with uses Emacs, so its better to work in LaTeX directly rather than have Emacs generate LaTeX. AUCTeX has section folding for LaTeX documents, though so far I’ve only has limited success at […] Navigating a LaTeX file first appeared on John D. Cook.  ( 5 min )
    HTML entity data
    It’s surprisingly hard to find a complete list of HTML entities in the form of a data file. There are numerous sites that give lists, often incomplete, in a page formatted to be human-readable but not machine-readable. Here’s an XML file from the W3C. Here’s a two-column text file I created from the W3C data. HTML entity data first appeared on John D. Cook.  ( 5 min )
    Double-struck capital letters
    I’ve needed to use double-struck capital letters lately, also called blackboard bold. There are a few quirks in how they are represented in Unicode and in HTML entities, so I’m leaving some notes for myself here and for anyone else who might need to look this up. Unicode The double-struck capital letters are split into […] Double-struck capital letters first appeared on John D. Cook.  ( 6 min )
  • Open

    Use Case Language Models: Taming the LLM Beast – Part 1
    “Sometimes, you don’t know where you’re going until you get there.” – Schmarzo-ism? Yes, writing this blog turned into a journey. I started in one direction, but after several twists and turns, I ended up with this concept – that use case-centric language models can be combined into entity-centric language models that can support multiple… Read More »Use Case Language Models: Taming the LLM Beast – Part 1 The post Use Case Language Models: Taming the LLM Beast – Part 1 appeared first on Data Science Central.  ( 22 min )
  • Open

    RL with comsol multiphysics
    Hi has anyone ever attempted to do RL with comsol multiphysics or any other FEM based simulation tool? submitted by /u/Practical_Ad_8782 [link] [comments]  ( 8 min )
    Mini RL Lab
    Hi all, I'd like to share some of what I've learned over the last ~year getting up to speed with Python and RL. Mini RL Lab is a setup and workflow that works well for me to debug and experiment with concepts like agent algorithms, world models, planning, plasticity, transformers etc, and other beginners might find it a useful starting point for their own experiments. Link: https://github.com/modelbased/minirllab submitted by /u/thiagoazevedo [link] [comments]  ( 9 min )
    in counterfactual regret minimization, is it possible to compute regret for a move that was made more than 1 move ago?
    As I understand, like in the example rock paper scissors, they compute the regret based on the last state. or in poker, they compute regret based on what last happened. But is it possible to compute regret for two or more moves ago? like "i wish when I saw 10 moves ago, I did x"? or has it been possible from the start and I just understood counterfactual regret minimization wrongly? submitted by /u/oniongarlic88 [link] [comments]  ( 9 min )
    Question about hypernetworks in RL
    Hey everyone, I've been taking a look at hypernetworks and noticed they've been used in some cool ways in RL with stuff like Qmix and this Metalearning paper. Quick recap: f is our regular neural network: takes in input x and outputs y using weights \theta. g is the hypernetwork, it creates the weights \theta for f based on its own weights \phi and maybe the input x. Original paper: https://arxiv.org/abs/1609.09106 I watched this talk and it hinted that hypernetworks might even be better than our regular networks even for supervised learning regular tasks. So, I'm curious: What's the deal with hypernetworks? What makes them good or not vs regular neural networks? I get that they're good for metalearning, but could they also be a game-changer for other things, like sample efficiency? Does anyone have thoughts or reads on this? Thanks! ​ ​ submitted by /u/LazyButAmbitious [link] [comments]  ( 9 min )
  • Open

    Help!
    DON'T KNOW IF THIS POST BELONGS HERE BUT... I have zero knowledge about the AI/ML. And the thing is my college is asking us to do projects on Deep learning. They specifically asked us to pick a base paper from ACM Journals or IEEE Transactions which has been published after 2020. And implement these papers and do some novelty work. And I have zero clue how to proceed. submitted by /u/um2_doma [link] [comments]  ( 9 min )
  • Open

    Smooth Nash Equilibria: Algorithms and Complexity. (arXiv:2309.12226v1 [cs.GT])
    A fundamental shortcoming of the concept of Nash equilibrium is its computational intractability: approximating Nash equilibria in normal-form games is PPAD-hard. In this paper, inspired by the ideas of smoothed analysis, we introduce a relaxed variant of Nash equilibrium called $\sigma$-smooth Nash equilibrium, for a smoothness parameter $\sigma$. In a $\sigma$-smooth Nash equilibrium, players only need to achieve utility at least as high as their best deviation to a $\sigma$-smooth strategy, which is a distribution that does not put too much mass (as parametrized by $\sigma$) on any fixed action. We distinguish two variants of $\sigma$-smooth Nash equilibria: strong $\sigma$-smooth Nash equilibria, in which players are required to play $\sigma$-smooth strategies under equilibrium play, and weak $\sigma$-smooth Nash equilibria, where there is no such requirement. We show that both weak and strong $\sigma$-smooth Nash equilibria have superior computational properties to Nash equilibria: when $\sigma$ as well as an approximation parameter $\epsilon$ and the number of players are all constants, there is a constant-time randomized algorithm to find a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in normal-form games. In the same parameter regime, there is a polynomial-time deterministic algorithm to find a strong $\epsilon$-approximate $\sigma$-smooth Nash equilibrium in a normal-form game. These results stand in contrast to the optimal algorithm for computing $\epsilon$-approximate Nash equilibria, which cannot run in faster than quasipolynomial-time. We complement our upper bounds by showing that when either $\sigma$ or $\epsilon$ is an inverse polynomial, finding a weak $\epsilon$-approximate $\sigma$-smooth Nash equilibria becomes computationally intractable.  ( 3 min )
    Identification of pneumonia on chest x-ray images through machine learning. (arXiv:2309.11995v1 [eess.IV])
    Pneumonia is the leading infectious cause of infant death in the world. When identified early, it is possible to alter the prognosis of the patient, one could use imaging exams to help in the diagnostic confirmation. Performing and interpreting the exams as soon as possible is vital for a good treatment, with the most common exam for this pathology being chest X-ray. The objective of this study was to develop a software that identify the presence or absence of pneumonia in chest radiographs. The software was developed as a computational model based on machine learning using transfer learning technique. For the training process, images were collected from a database available online with children's chest X-rays images taken at a hospital in China. After training, the model was then exposed to new images, achieving relevant results on identifying such pathology, reaching 98% sensitivity and 97.3% specificity for the sample used for testing. It can be concluded that it is possible to develop a software that identifies pneumonia in chest X-ray images.  ( 2 min )
    LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent. (arXiv:2309.12311v1 [cs.CV])
    3D visual grounding is a critical skill for household robots, enabling them to navigate, manipulate objects, and answer questions based on their environment. While existing approaches often rely on extensive labeled data or exhibit limitations in handling complex language queries, we propose LLM-Grounder, a novel zero-shot, open-vocabulary, Large Language Model (LLM)-based 3D visual grounding pipeline. LLM-Grounder utilizes an LLM to decompose complex natural language queries into semantic constituents and employs a visual grounding tool, such as OpenScene or LERF, to identify objects in a 3D scene. The LLM then evaluates the spatial and commonsense relations among the proposed objects to make a final grounding decision. Our method does not require any labeled training data and can generalize to novel 3D scenes and arbitrary text queries. We evaluate LLM-Grounder on the ScanRefer benchmark and demonstrate state-of-the-art zero-shot grounding accuracy. Our findings indicate that LLMs significantly improve the grounding capability, especially for complex language queries, making LLM-Grounder an effective approach for 3D vision-language tasks in robotics. Videos and interactive demos can be found on the project website https://chat-with-nerf.github.io/ .  ( 2 min )
    What Learned Representations and Influence Functions Can Tell Us About Adversarial Examples. (arXiv:2309.10916v2 [cs.LG] UPDATED)
    Adversarial examples, deliberately crafted using small perturbations to fool deep neural networks, were first studied in image processing and more recently in NLP. While approaches to detecting adversarial examples in NLP have largely relied on search over input perturbations, image processing has seen a range of techniques that aim to characterise adversarial subspaces over the learned representations. In this paper, we adapt two such approaches to NLP, one based on nearest neighbors and influence functions and one on Mahalanobis distances. The former in particular produces a state-of-the-art detector when compared against several strong baselines; moreover, the novel use of influence functions provides insight into how the nature of adversarial example subspaces in NLP relate to those in image processing, and also how they differ depending on the kind of NLP task.  ( 2 min )
    Neural-BO: A Black-box Optimization Algorithm using Deep Neural Networks. (arXiv:2303.01682v2 [cs.LG] UPDATED)
    Bayesian Optimization (BO) is an effective approach for global optimization of black-box functions when function evaluations are expensive. Most prior works use Gaussian processes to model the black-box function, however, the use of kernels in Gaussian processes leads to two problems: first, the kernel-based methods scale poorly with the number of data points and second, kernel methods are usually not effective on complex structured high dimensional data due to curse of dimensionality. Therefore, we propose a novel black-box optimization algorithm where the black-box function is modeled using a neural network. Our algorithm does not need a Bayesian neural network to estimate predictive uncertainty and is therefore computationally favorable. We analyze the theoretical behavior of our algorithm in terms of regret bound using advances in NTK theory showing its efficient convergence. We perform experiments with both synthetic and real-world optimization tasks and show that our algorithm is more sample efficient compared to existing methods.  ( 2 min )
    GrACE: Generation using Associated Code Edits. (arXiv:2305.14129v3 [cs.SE] UPDATED)
    Developers expend a significant amount of time in editing code for a variety of reasons such as bug fixing or adding new features. Designing effective methods to predict code edits has been an active yet challenging area of research due to the diversity of code edits and the difficulty of capturing the developer intent. In this work, we address these challenges by endowing pre-trained large language models (LLMs) of code with the knowledge of prior, relevant edits. The generative capability of the LLMs helps address the diversity in code changes and conditioning code generation on prior edits helps capture the latent developer intent. We evaluate two well-known LLMs, Codex and CodeT5, in zero-shot and fine-tuning settings respectively. In our experiments with two datasets, the knowledge of prior edits boosts the performance of the LLMs significantly and enables them to generate 29% and 54% more correctly edited code in top-1 suggestions relative to the current state-of-the-art symbolic and neural approaches, respectively.  ( 2 min )
    CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model. (arXiv:2305.06908v3 [cs.SD] UPDATED)
    Denoising diffusion probabilistic models (DDPMs) have shown promising performance for speech synthesis. However, a large number of iterative steps are required to achieve high sample quality, which restricts the inference speed. Maintaining sample quality while increasing sampling speed has become a challenging task. In this paper, we propose a "Co"nsistency "Mo"del-based "Speech" synthesis method, CoMoSpeech, which achieve speech synthesis through a single diffusion sampling step while achieving high audio quality. The consistency constraint is applied to distill a consistency model from a well-designed diffusion-based teacher model, which ultimately yields superior performances in the distilled CoMoSpeech. Our experiments show that by generating audio recordings by a single sampling step, the CoMoSpeech achieves an inference speed more than 150 times faster than real-time on a single NVIDIA A100 GPU, which is comparable to FastSpeech2, making diffusion-sampling based speech synthesis truly practical. Meanwhile, objective and subjective evaluations on text-to-speech and singing voice synthesis show that the proposed teacher models yield the best audio quality, and the one-step sampling based CoMoSpeech achieves the best inference speed with better or comparable audio quality to other conventional multi-step diffusion model baselines. Audio samples are available at https://comospeech.github.io/.  ( 3 min )
    Stochastic stiffness identification and response estimation of Timoshenko beams via physics-informed Gaussian processes. (arXiv:2309.11875v1 [cs.LG])
    Machine learning models trained with structural health monitoring data have become a powerful tool for system identification. This paper presents a physics-informed Gaussian process (GP) model for Timoshenko beam elements. The model is constructed as a multi-output GP with covariance and cross-covariance kernels analytically derived based on the differential equations for deflections, rotations, strains, bending moments, shear forces and applied loads. Stiffness identification is performed in a Bayesian format by maximising a posterior model through a Markov chain Monte Carlo method, yielding a stochastic model for the structural parameters. The optimised GP model is further employed for probabilistic predictions of unobserved responses. Additionally, an entropy-based method for physics-informed sensor placement optimisation is presented, exploiting heterogeneous sensor position information and structural boundary conditions built into the GP model. Results demonstrate that the proposed approach is effective at identifying structural parameters and is capable of fusing data from heterogeneous and multi-fidelity sensors. Probabilistic predictions of structural responses and internal forces are in closer agreement with measured data. We validate our model with an experimental setup and discuss the quality and uncertainty of the obtained results. The proposed approach has potential applications in the field of structural health monitoring (SHM) for both mechanical and structural systems.  ( 3 min )
    fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese. (arXiv:2309.11052v2 [cs.CL] UPDATED)
    The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.  ( 3 min )
    Improving VTE Identification through Adaptive NLP Model Selection and Clinical Expert Rule-based Classifier from Radiology Reports. (arXiv:2309.12273v1 [cs.CL])
    Rapid and accurate identification of Venous thromboembolism (VTE), a severe cardiovascular condition including deep vein thrombosis (DVT) and pulmonary embolism (PE), is important for effective treatment. Leveraging Natural Language Processing (NLP) on radiology reports, automated methods have shown promising advancements in identifying VTE events from retrospective data cohorts or aiding clinical experts in identifying VTE events from radiology reports. However, effectively training Deep Learning (DL) and the NLP models is challenging due to limited labeled medical text data, the complexity and heterogeneity of radiology reports, and data imbalance. This study proposes novel method combinations of DL methods, along with data augmentation, adaptive pre-trained NLP model selection, and a clinical expert NLP rule-based classifier, to improve the accuracy of VTE identification in unstructured (free-text) radiology reports. Our experimental results demonstrate the model's efficacy, achieving an impressive 97\% accuracy and 97\% F1 score in predicting DVT, and an outstanding 98.3\% accuracy and 98.4\% F1 score in predicting PE. These findings emphasize the model's robustness and its potential to significantly contribute to VTE research.  ( 2 min )
    Face Identity-Aware Disentanglement in StyleGAN. (arXiv:2309.12033v1 [cs.CV])
    Conditional GANs are frequently used for manipulating the attributes of face images, such as expression, hairstyle, pose, or age. Even though the state-of-the-art models successfully modify the requested attributes, they simultaneously modify other important characteristics of the image, such as a person's identity. In this paper, we focus on solving this problem by introducing PluGeN4Faces, a plugin to StyleGAN, which explicitly disentangles face attributes from a person's identity. Our key idea is to perform training on images retrieved from movie frames, where a given person appears in various poses and with different attributes. By applying a type of contrastive loss, we encourage the model to group images of the same person in similar regions of latent space. Our experiments demonstrate that the modifications of face attributes performed by PluGeN4Faces are significantly less invasive on the remaining characteristics of the image than in the existing state-of-the-art models.  ( 2 min )
    Nonparametric and Regularized Dynamical Wasserstein Barycenters for Sequential Observations. (arXiv:2210.01918v3 [cs.LG] UPDATED)
    We consider probabilistic models for sequential observations which exhibit gradual transitions among a finite number of states. We are particularly motivated by applications such as human activity analysis where observed accelerometer time series contains segments representing distinct activities, which we call pure states, as well as periods characterized by continuous transition among these pure states. To capture this transitory behavior, the dynamical Wasserstein barycenter (DWB) model of Cheng et al. in 2021 [1] associates with each pure state a data-generating distribution and models the continuous transitions among these states as a Wasserstein barycenter of these distributions with dynamically evolving weights. Focusing on the univariate case where Wasserstein distances and barycenters can be computed in closed form, we extend [1] specifically relaxing the parameterization of the pure states as Gaussian distributions. We highlight issues related to the uniqueness in identifying the model parameters as well as uncertainties induced when estimating a dynamically evolving distribution from a limited number of samples. To ameliorate non-uniqueness, we introduce regularization that imposes temporal smoothness on the dynamics of the barycentric weights. A quantile-based approximation of the pure state distributions yields a finite dimensional estimation problem which we numerically solve using cyclic descent alternating between updates to the pure-state quantile functions and the barycentric weights. We demonstrate the utility of the proposed algorithm in segmenting both simulated and real world human activity time series.  ( 3 min )
    Physics-informed State-space Neural Networks for Transport Phenomena. (arXiv:2309.12211v1 [cs.LG])
    This work introduces Physics-informed State-space neural network Models (PSMs), a novel solution to achieving real-time optimization, flexibility, and fault tolerance in autonomous systems, particularly in transport-dominated systems such as chemical, biomedical, and power plants. Traditional data-driven methods fall short due to a lack of physical constraints like mass conservation; PSMs address this issue by training deep neural networks with sensor data and physics-informing using components' Partial Differential Equations (PDEs), resulting in a physics-constrained, end-to-end differentiable forward dynamics model. Through two in silico experiments - a heated channel and a cooling system loop - we demonstrate that PSMs offer a more accurate approach than purely data-driven models. Beyond accuracy, there are several compelling use cases for PSMs. In this work, we showcase two: the creation of a nonlinear supervisory controller through a sequentially updated state-space representation and the proposal of a diagnostic algorithm using residuals from each of the PDEs. The former demonstrates the ability of PSMs to handle both constant and time-dependent constraints, while the latter illustrates their value in system diagnostics and fault detection. We further posit that PSMs could serve as a foundation for Digital Twins, constantly updated digital representations of physical systems.  ( 2 min )
    Generating Hierarchical Structures for Improved Time Series Classification Using Stochastic Splitting Functions. (arXiv:2309.11963v1 [cs.LG])
    This study introduces a novel hierarchical divisive clustering approach with stochastic splitting functions (SSFs) to enhance classification performance in multi-class datasets through hierarchical classification (HC). The method has the unique capability of generating hierarchy without requiring explicit information, making it suitable for datasets lacking prior knowledge of hierarchy. By systematically dividing classes into two subsets based on their discriminability according to the classifier, the proposed approach constructs a binary tree representation of hierarchical classes. The approach is evaluated on 46 multi-class time series datasets using popular classifiers (svm and rocket) and SSFs (potr, srtr, and lsoo). The results reveal that the approach significantly improves classification performance in approximately half and a third of the datasets when using rocket and svm as the classifier, respectively. The study also explores the relationship between dataset features and HC performance. While the number of classes and flat classification (FC) score show consistent significance, variations are observed with different splitting functions. Overall, the proposed approach presents a promising strategy for enhancing classification by generating hierarchical structure in multi-class time series datasets. Future research directions involve exploring different splitting functions, classifiers, and hierarchy structures, as well as applying the approach to diverse domains beyond time series data. The source code is made openly available to facilitate reproducibility and further exploration of the method.  ( 2 min )
    Compositional Foundation Models for Hierarchical Planning. (arXiv:2309.08587v2 [cs.LG] UPDATED)
    To make effective decisions in novel environments with long-horizon goals, it is crucial to engage in hierarchical reasoning across spatial and temporal scales. This entails planning abstract subgoal sequences, visually reasoning about the underlying plans, and executing actions in accordance with the devised plan through visual-motor control. We propose Compositional Foundation Models for Hierarchical Planning (HiP), a foundation model which leverages multiple expert foundation model trained on language, vision and action data individually jointly together to solve long-horizon tasks. We use a large language model to construct symbolic plans that are grounded in the environment through a large video diffusion model. Generated video plans are then grounded to visual-motor control, through an inverse dynamics model that infers actions from generated videos. To enable effective reasoning within this hierarchy, we enforce consistency between the models via iterative refinement. We illustrate the efficacy and adaptability of our approach in three different long-horizon table-top manipulation tasks.  ( 2 min )
    SR-PredictAO: Session-based Recommendation with High-Capability Predictor Add-On. (arXiv:2309.12218v1 [cs.IR])
    Session-based recommendation, aiming at making the prediction of the user's next item click based on the information in a single session only even in the presence of some random user's behavior, is a complex problem. This complex problem requires a high-capability model of predicting the user's next action. Most (if not all) existing models follow the encoder-predictor paradigm where all studies focus on how to optimize the encoder module extensively in the paradigm but they ignore how to optimize the predictor module. In this paper, we discover the existing critical issue of the low-capability predictor module among existing models. Motivated by this, we propose a novel framework called \emph{\underline{S}ession-based \underline{R}ecommendation with \underline{Pred}ictor \underline{A}dd-\underline{O}n} (SR-PredictAO). In this framework, we propose a high-capability predictor module which could alleviate the effect of random user's behavior for prediction. It is worth mentioning that this framework could be applied to any existing models, which could give opportunities for further optimizing the framework. Extensive experiments on two real benchmark datasets for three state-of-the-art models show that \emph{SR-PredictAO} out-performs the current state-of-the-art model by up to 2.9\% in HR@20 and 2.3\% in MRR@20. More importantly, the improvement is consistent across almost all the existing models on all datasets, which could be regarded as a significant contribution in the field.  ( 2 min )
    Survey of Action Recognition, Spotting and Spatio-Temporal Localization in Soccer -- Current Trends and Research Perspectives. (arXiv:2309.12067v1 [cs.CV])
    Action scene understanding in soccer is a challenging task due to the complex and dynamic nature of the game, as well as the interactions between players. This article provides a comprehensive overview of this task divided into action recognition, spotting, and spatio-temporal action localization, with a particular emphasis on the modalities used and multimodal methods. We explore the publicly available data sources and metrics used to evaluate models' performance. The article reviews recent state-of-the-art methods that leverage deep learning techniques and traditional methods. We focus on multimodal methods, which integrate information from multiple sources, such as video and audio data, and also those that represent one source in various ways. The advantages and limitations of methods are discussed, along with their potential for improving the accuracy and robustness of models. Finally, the article highlights some of the open research questions and future directions in the field of soccer action recognition, including the potential for multimodal methods to advance this field. Overall, this survey provides a valuable resource for researchers interested in the field of action scene understanding in soccer.  ( 2 min )
    Contrastive Pseudo Learning for Open-World DeepFake Attribution. (arXiv:2309.11132v1 [cs.CV] CROSS LISTED)
    The challenge in sourcing attribution for forgery faces has gained widespread attention due to the rapid development of generative techniques. While many recent works have taken essential steps on GAN-generated faces, more threatening attacks related to identity swapping or expression transferring are still overlooked. And the forgery traces hidden in unknown attacks from the open-world unlabeled faces still remain under-explored. To push the related frontier research, we introduce a new benchmark called Open-World DeepFake Attribution (OW-DFA), which aims to evaluate attribution performance against various types of fake faces under open-world scenarios. Meanwhile, we propose a novel framework named Contrastive Pseudo Learning (CPL) for the OW-DFA task through 1) introducing a Global-Local Voting module to guide the feature alignment of forged faces with different manipulated regions, 2) designing a Confidence-based Soft Pseudo-label strategy to mitigate the pseudo-noise caused by similar methods in unlabeled set. In addition, we extend the CPL framework with a multi-stage paradigm that leverages pre-train technique and iterative learning to further enhance traceability performance. Extensive experiments verify the superiority of our proposed method on the OW-DFA and also demonstrate the interpretability of deepfake attribution task and its impact on improving the security of deepfake detection area.  ( 2 min )
    Parallelizing non-linear sequential models over the sequence length. (arXiv:2309.12252v1 [cs.LG])
    Sequential models, such as Recurrent Neural Networks and Neural Ordinary Differential Equations, have long suffered from slow training due to their inherent sequential nature. For many years this bottleneck has persisted, as many thought sequential models could not be parallelized. We challenge this long-held belief with our parallel algorithm that accelerates GPU evaluation of sequential models by up to 3 orders of magnitude faster without compromising output accuracy. The algorithm does not need any special structure in the sequential models' architecture, making it applicable to a wide range of architectures. Using our method, training sequential models can be more than 10 times faster than the common sequential method without any meaningful difference in the training results. Leveraging this accelerated training, we discovered the efficacy of the Gated Recurrent Unit in a long time series classification problem with 17k time samples. By overcoming the training bottleneck, our work serves as the first step to unlock the potential of non-linear sequential models for long sequence problems.  ( 2 min )
    Cross-scale Multi-instance Learning for Pathological Image Diagnosis. (arXiv:2304.00216v2 [eess.IV] CROSS LISTED)
    Analyzing high resolution whole slide images (WSIs) with regard to information across multiple scales poses a significant challenge in digital pathology. Multi-instance learning (MIL) is a common solution for working with high resolution images by classifying bags of objects (i.e. sets of smaller image patches). However, such processing is typically performed at a single scale (e.g., 20x magnification) of WSIs, disregarding the vital inter-scale information that is key to diagnoses by human pathologists. In this study, we propose a novel cross-scale MIL algorithm to explicitly aggregate inter-scale relationships into a single MIL network for pathological image diagnosis. The contribution of this paper is three-fold: (1) A novel cross-scale MIL (CS-MIL) algorithm that integrates the multi-scale information and the inter-scale relationships is proposed; (2) A toy dataset with scale-specific morphological features is created and released to examine and visualize differential cross-scale attention; (3) Superior performance on both in-house and public datasets is demonstrated by our simple cross-scale MIL strategy. The official implementation is publicly available at https://github.com/hrlblab/CS-MIL.  ( 2 min )
    Soft Merging: A Flexible and Robust Soft Model Merging Approach for Enhanced Neural Network Performance. (arXiv:2309.12259v1 [cs.LG])
    Stochastic Gradient Descent (SGD), a widely used optimization algorithm in deep learning, is often limited to converging to local optima due to the non-convex nature of the problem. Leveraging these local optima to improve model performance remains a challenging task. Given the inherent complexity of neural networks, the simple arithmetic averaging of the obtained local optima models in undesirable results. This paper proposes a {\em soft merging} method that facilitates rapid merging of multiple models, simplifies the merging of specific parts of neural networks, and enhances robustness against malicious models with extreme values. This is achieved by learning gate parameters through a surrogate of the $l_0$ norm using hard concrete distribution without modifying the model weights of the given local optima models. This merging process not only enhances the model performance by converging to a better local optimum, but also minimizes computational costs, offering an efficient and explicit learning process integrated with stochastic gradient descent. Thorough experiments underscore the effectiveness and superior performance of the merged neural networks.
    Adaptive Input-image Normalization for Solving Mode Collapse Problem in GAN-based X-ray Images. (arXiv:2309.12245v1 [eess.IV])
    Biomedical image datasets can be imbalanced due to the rarity of targeted diseases. Generative Adversarial Networks play a key role in addressing this imbalance by enabling the generation of synthetic images to augment datasets. It is important to generate synthetic images that incorporate a diverse range of features to accurately represent the distribution of features present in the training imagery. Furthermore, the absence of diverse features in synthetic images can degrade the performance of machine learning classifiers. The mode collapse problem impacts Generative Adversarial Networks' capacity to generate diversified images. Mode collapse comes in two varieties: intra-class and inter-class. In this paper, both varieties of the mode collapse problem are investigated, and their subsequent impact on the diversity of synthetic X-ray images is evaluated. This work contributes an empirical demonstration of the benefits of integrating the adaptive input-image normalization with the Deep Convolutional GAN and Auxiliary Classifier GAN to alleviate the mode collapse problems. Synthetically generated images are utilized for data augmentation and training a Vision Transformer model. The classification performance of the model is evaluated using accuracy, recall, and precision scores. Results demonstrate that the DCGAN and the ACGAN with adaptive input-image normalization outperform the DCGAN and ACGAN with un-normalized X-ray images as evidenced by the superior diversity scores and classification scores.
    Identifying Expert Behavior in Offline Training Datasets Improves Behavioral Cloning of Robotic Manipulation Policies. (arXiv:2301.13019v2 [cs.RO] UPDATED)
    This paper presents our solution for the Real Robot Challenge (RRC) III, a competition featured in the NeurIPS 2022 Competition Track, aimed at addressing dexterous robotic manipulation tasks through learning from pre-collected offline data. Participants were provided with two types of datasets for each task: expert and mixed datasets with varying skill levels. While the simplest offline policy learning algorithm, Behavioral Cloning (BC), performed remarkably well when trained on expert datasets, it outperformed even the most advanced offline reinforcement learning (RL) algorithms. However, BC's performance deteriorated when applied to mixed datasets, and the performance of offline RL algorithms was also unsatisfactory. Upon examining the mixed datasets, we observed that they contained a significant amount of expert data, although this data was unlabeled. To address this issue, we proposed a semi-supervised learning-based classifier to identify the underlying expert behavior within mixed datasets, effectively isolating the expert data. To further enhance BC's performance, we leveraged the geometric symmetry of the RRC arena to augment the training dataset through mathematical transformations. In the end, our submission surpassed that of all other participants, even those who employed complex offline RL algorithms and intricate data processing and feature engineering techniques.
    Empowering Precision Medicine: AI-Driven Schizophrenia Diagnosis via EEG Signals: A Comprehensive Review from 2002-2023. (arXiv:2309.12202v1 [eess.SP])
    Schizophrenia (SZ) is a prevalent mental disorder characterized by cognitive, emotional, and behavioral changes. Symptoms of SZ include hallucinations, illusions, delusions, lack of motivation, and difficulties in concentration. Diagnosing SZ involves employing various tools, including clinical interviews, physical examinations, psychological evaluations, the Diagnostic and Statistical Manual of Mental Disorders (DSM), and neuroimaging techniques. Electroencephalography (EEG) recording is a significant functional neuroimaging modality that provides valuable insights into brain function during SZ. However, EEG signal analysis poses challenges for neurologists and scientists due to the presence of artifacts, long-term recordings, and the utilization of multiple channels. To address these challenges, researchers have introduced artificial intelligence (AI) techniques, encompassing conventional machine learning (ML) and deep learning (DL) methods, to aid in SZ diagnosis. This study reviews papers focused on SZ diagnosis utilizing EEG signals and AI methods. The introduction section provides a comprehensive explanation of SZ diagnosis methods and intervention techniques. Subsequently, review papers in this field are discussed, followed by an introduction to the AI methods employed for SZ diagnosis and a summary of relevant papers presented in tabular form. Additionally, this study reports on the most significant challenges encountered in SZ diagnosis, as identified through a review of papers in this field. Future directions to overcome these challenges are also addressed. The discussion section examines the specific details of each paper, culminating in the presentation of conclusions and findings.
    Hyena Neural Operator for Partial Differential Equations. (arXiv:2306.16524v2 [cs.LG] UPDATED)
    Numerically solving partial differential equations typically requires fine discretization to resolve necessary spatiotemporal scales, which can be computationally expensive. Recent advances in deep learning have provided a new approach to solving partial differential equations that involves the use of neural operators. Neural operators are neural network architectures that learn mappings between function spaces and have the capability to solve partial differential equations based on data. This study utilizes a novel neural operator called Hyena, which employs a long convolutional filter that is parameterized by a multilayer perceptron. The Hyena operator is an operation that enjoys sub-quadratic complexity and state space model to parameterize long convolution that enjoys a global receptive field. This mechanism enhances the model's comprehension of the input's context and enables data-dependent weight for different partial differential equations instances. To measure how effective the layers are in solving partial differential equations, we conduct experiments on Diffusion-Reaction equation and Navier Stokes equation. Our findings indicate Hyena Neural operator can serve as an efficient and accurate model for learning partial differential equations solution operator. The data and code used can be found at: https://github.com/Saupatil07/Hyena-Neural-Operator  ( 2 min )
    ALI-DPFL: Differentially Private Federated Learning with Adaptive Local Iterations. (arXiv:2308.10457v2 [cs.LG] UPDATED)
    Federated Learning (FL) is a distributed machine learning technique that allows model training among multiple devices or organizations by sharing training parameters instead of raw data. However, adversaries can still infer individual information through inference attacks (e.g. differential attacks) on these training parameters. As a result, Differential Privacy (DP) has been widely used in FL to prevent such attacks. We consider differentially private federated learning in a resource-constrained scenario, where both privacy budget and communication round are constrained. By theoretically analyzing the convergence, we can find the optimal number of differentially private local iterations for clients between any two sequential global updates. Based on this, we design an algorithm of differentially private federated learning with adaptive local iterations (ALI-DPFL). We experiment our algorithm on the FashionMNIST and CIFAR10 datasets, and demonstrate significantly better performances than previous work in the resource-constraint scenario.
    Weakly supervised learning for pattern classification in serial femtosecond crystallography. (arXiv:2309.04474v2 [cond-mat.mtrl-sci] UPDATED)
    Serial femtosecond crystallography at X-ray free electron laser facilities opens a new era for the determination of crystal structure. However, the data processing of those experiments is facing unprecedented challenge, because the total number of diffraction patterns needed to determinate a high-resolution structure is huge. Machine learning methods are very likely to play important roles in dealing with such a large volume of data. Convolutional neural networks have made a great success in the field of pattern classification, however, training of the networks need very large datasets with labels. Th is heavy dependence on labeled datasets will seriously restrict the application of networks, because it is very costly to annotate a large number of diffraction patterns. In this article we present our job on the classification of diffraction pattern by weakly supervised algorithms, with the aim of reducing as much as possible the size of the labeled dataset required for training. Our result shows that weakly supervised methods can significantly reduce the need for the number of labeled patterns while achieving comparable accuracy to fully supervised methods.
    Federated Learning for Medical Applications: A Taxonomy, Current Trends, Challenges, and Future Research Directions. (arXiv:2208.03392v4 [cs.LG] UPDATED)
    With the advent of the IoT, AI and ML/DL algorithms, the landscape of data-driven medical applications has emerged as a promising avenue for designing robust and scalable diagnostic and prognostic models from medical data. Consequently, the realm of data-driven medical applications has garnered significant attention spanning academia and industry, ushering in marked enhancements in healthcare delivery quality. Despite these strides, the adoption of AI-driven medical applications remains hindered by formidable challenges, including the arduous task of meeting security, privacy, and quality of service (QoS) standards. Recent developments in federated learning have made it possible to train complex machine-learned models in a distributed manner and has become an active research domain, particularly processing the medical data at the edge of the network in a decentralized way to preserve privacy and address security concerns. To this end, this survey paper highlights the current and future of FL technology in medical applications where data sharing is a significant burden. We delve into the contemporary research trends and their outcomes, unravelling the intricacies of designing reliable and scalable FL models. Our survey outlines the foundational statistical predicaments of FL, confronts device-related obstacles, delves into security challenges, and navigates the intricate terrain of privacy concerns, all while spotlighting its transformative potential within the medical domain. A primary focus of our study rests on medical applications, where we underscore the weighty burden of global cancer and illuminate the potency of FL in engendering computer-aided diagnosis tools that address this challenge with heightened efficacy.
    Drifter: Efficient Online Feature Monitoring for Improved Data Integrity in Large-Scale Recommendation Systems. (arXiv:2309.08617v2 [cs.IR] UPDATED)
    Real-world production systems often grapple with maintaining data quality in large-scale, dynamic streams. We introduce Drifter, an efficient and lightweight system for online feature monitoring and verification in recommendation use cases. Drifter addresses limitations of existing methods by delivering agile, responsive, and adaptable data quality monitoring, enabling real-time root cause analysis, drift detection and insights into problematic production events. Integrating state-of-the-art online feature ranking for sparse data and anomaly detection ideas, Drifter is highly scalable and resource-efficient, requiring only two threads and less than a gigabyte of RAM per production deployments that handle millions of instances per minute. Evaluation on real-world data sets demonstrates Drifter's effectiveness in alerting and mitigating data quality issues, substantially improving reliability and performance of real-time live recommender systems.
    Improving Behavioural Cloning with Positive Unlabeled Learning. (arXiv:2301.11734v2 [cs.LG] UPDATED)
    Learning control policies offline from pre-recorded datasets is a promising avenue for solving challenging real-world problems. However, available datasets are typically of mixed quality, with a limited number of the trajectories that we would consider as positive examples; i.e., high-quality demonstrations. Therefore, we propose a novel iterative learning algorithm for identifying expert trajectories in unlabeled mixed-quality robotics datasets given a minimal set of positive examples, surpassing existing algorithms in terms of accuracy. We show that applying behavioral cloning to the resulting filtered dataset outperforms several competitive offline reinforcement learning and imitation learning baselines. We perform experiments on a range of simulated locomotion tasks and on two challenging manipulation tasks on a real robotic system; in these experiments, our method showcases state-of-the-art performance. Our website: \url{https://sites.google.com/view/offline-policy-learning-pubc}.
    Variational Connectionist Temporal Classification for Order-Preserving Sequence Modeling. (arXiv:2309.11983v1 [cs.LG])
    Connectionist temporal classification (CTC) is commonly adopted for sequence modeling tasks like speech recognition, where it is necessary to preserve order between the input and target sequences. However, CTC is only applied to deterministic sequence models, where the latent space is discontinuous and sparse, which in turn makes them less capable of handling data variability when compared to variational models. In this paper, we integrate CTC with a variational model and derive loss functions that can be used to train more generalizable sequence models that preserve order. Specifically, we derive two versions of the novel variational CTC based on two reasonable assumptions, the first being that the variational latent variables at each time step are conditionally independent; and the second being that these latent variables are Markovian. We show that both loss functions allow direct optimization of the variational lower bound for the model log-likelihood, and present computationally tractable forms for implementing them.
    PrNet: A Neural Network for Correcting Pseudoranges to Improve Positioning with Android Raw GNSS Measurements. (arXiv:2309.12204v1 [cs.LG])
    We present a neural network for mitigating pseudorange bias to improve localization performance with data collected from Android smartphones. We represent pseudorange bias using a pragmatic satellite-wise Multiple Layer Perceptron (MLP), the inputs of which are six satellite-receiver-context-related features derived from Android raw Global Navigation Satellite System (GNSS) measurements. To supervise the training process, we carefully calculate the target values of pseudorange bias using location ground truth and smoothing techniques and optimize a loss function containing the estimation residuals of smartphone clock bias. During the inference process, we employ model-based localization engines to compute locations with pseudoranges corrected by the neural network. Consequently, this hybrid pipeline can attend to both pseudorange bias and noise. We evaluate the framework on an open dataset and consider four application scenarios for investigating fingerprinting and cross-trace localization in rural and urban areas. Extensive experiments demonstrate that the proposed framework outperforms model-based and state-of-the-art data-driven approaches.
    Model-based Deep Learning for High-Dimensional Periodic Structures. (arXiv:2309.12223v1 [eess.SP])
    This work presents a deep learning surrogate model for the fast simulation of high-dimensional frequency selective surfaces. We consider unit-cells which are built as multiple concatenated stacks of screens and their design requires the control over many geometrical degrees of freedom. Thanks to the introduction of physical insight into the model, it can produce accurate predictions of the S-parameters of a certain structure after training with a reduced dataset.The proposed model is highly versatile and it can be used with any kind of frequency selective surface, based on either perforations or patches of any arbitrary geometry. Numeric examples are presented here for the case of frequency selective surfaces composed of screens with rectangular perforations, showing an excellent agreement between the predicted performance and such obtained with a full-wave simulator.
    Domain-knowledge Inspired Pseudo Supervision (DIPS) for Unsupervised Image-to-Image Translation Models to Support Cross-Domain Classification. (arXiv:2303.10310v3 [cs.CV] UPDATED)
    The ability to classify images is dependent on having access to large labeled datasets and testing on data from the same domain that the model can train on. Classification becomes more challenging when dealing with new data from a different domain, where gathering and especially labeling a larger image dataset for retraining a classification model requires a labor-intensive human effort. Cross-domain classification frameworks were developed to handle this data domain shift problem by utilizing unsupervised image-to-image translation models to translate an input image from the unlabeled domain to the labeled domain. The problem with these unsupervised models lies in their unsupervised nature. For lack of annotations, it is not possible to use the traditional supervised metrics to evaluate these translation models to pick the best-saved checkpoint model. This paper introduces a new method called Domain-knowledge Inspired Pseudo Supervision (DIPS) which utilizes domain-informed Gaussian Mixture Models to generate pseudo annotations to enable the use of traditional supervised metrics. This method was designed specifically to support cross-domain classification applications contrary to other typically used metrics such as the FID which were designed to evaluate the model in terms of the quality of the generated image from a human-eye perspective. DIPS proves its effectiveness by outperforming various GAN evaluation metrics, including FID, when selecting the optimal saved checkpoint model. It is also evaluated against truly supervised metrics. Furthermore, DIPS showcases its robustness and interpretability by demonstrating a strong correlation with truly supervised metrics, highlighting its superiority over existing state-of-the-art alternatives. The code and data to replicate the results can be found on the official Github repository: https://github.com/Hindawi91/DIPS
    DREAM: A Dynamic Scheduler for Dynamic Real-time Multi-model ML Workloads. (arXiv:2212.03414v2 [cs.DC] UPDATED)
    Emerging real-time multi-model ML (RTMM) workloads such as AR/VR and drone control involve dynamic behaviors in various granularity; task, model, and layers within a model. Such dynamic behaviors introduce new challenges to the system software in an ML system since the overall system load is not completely predictable, unlike traditional ML workloads. In addition, RTMM workloads require real-time processing, involve highly heterogeneous models, and target resource-constrained devices. Under such circumstances, developing an effective scheduler gains more importance to better utilize underlying hardware considering the unique characteristics of RTMM workloads. Therefore, we propose a new scheduler, DREAM, which effectively handles various dynamicity in RTMM workloads targeting multi-accelerator systems. DREAM quantifies the unique requirements for RTMM workloads and utilizes the quantified scores to drive scheduling decisions, considering the current system load and other inference jobs on different models and input frames. DREAM utilizes tunable parameters that provide fast and effective adaptivity to dynamic workload changes. In our evaluation of five scenarios of RTMM workload, DREAM reduces the overall UXCost, which is an equivalent metric of the energy-delay product (EDP) for RTMM defined in the paper, by 32.2% and 50.0% in the geometric mean (up to 80.8% and 97.6%) compared to state-of-the-art baselines, which shows the efficacy of our scheduling methodology.
    SALSA-CLRS: A Sparse and Scalable Benchmark for Algorithmic Reasoning. (arXiv:2309.12253v1 [cs.LG])
    We introduce an extension to the CLRS algorithmic learning benchmark, prioritizing scalability and the utilization of sparse representations. Many algorithms in CLRS require global memory or information exchange, mirrored in its execution model, which constructs fully connected (not sparse) graphs based on the underlying problem. Despite CLRS's aim of assessing how effectively learned algorithms can generalize to larger instances, the existing execution model becomes a significant constraint due to its demanding memory requirements and runtime (hard to scale). However, many important algorithms do not demand a fully connected graph; these algorithms, primarily distributed in nature, align closely with the message-passing paradigm employed by Graph Neural Networks. Hence, we propose SALSA-CLRS, an extension of the current CLRS benchmark specifically with scalability and sparseness in mind. Our approach includes adapted algorithms from the original CLRS benchmark and introduces new problems from distributed and randomized algorithms. Moreover, we perform a thorough empirical evaluation of our benchmark. Code is publicly available at https://github.com/jkminder/SALSA-CLRS.
    Class-wise Classifier Design Capable of Continual Learning using Adaptive Resonance Theory-based Topological Clustering. (arXiv:2203.09879v2 [cs.LG] UPDATED)
    This paper proposes a supervised classification algorithm capable of continual learning by utilizing an Adaptive Resonance Theory (ART)-based growing self-organizing clustering algorithm. The ART-based clustering algorithm is theoretically capable of continual learning, and the proposed algorithm independently applies it to each class of training data for generating classifiers. Whenever an additional training data set from a new class is given, a new ART-based clustering will be defined in a different learning space. Thanks to the above-mentioned features, the proposed algorithm realizes continual learning capability. Simulation experiments showed that the proposed algorithm has superior classification performance compared with state-of-the-art clustering-based classification algorithms capable of continual learning.
    Decision-making and control with diffractive optical networks. (arXiv:2212.11278v3 [cs.LG] UPDATED)
    The ultimate goal of artificial intelligence is to mimic the human brain to perform decision-making and control directly from high-dimensional sensory input. Diffractive optical networks provide a promising solution for implementing artificial intelligence with high-speed and low-power consumption. Most of the reported diffractive optical networks focus on single or multiple tasks that do not involve environmental interaction, such as object recognition and image classification. In contrast, the networks capable of performing decision-making and control have not yet been developed to our knowledge. Here, we propose using deep reinforcement learning to implement diffractive optical networks that imitate human-level decision-making and control capability. Such networks taking advantage of a residual architecture, allow for finding optimal control policies through interaction with the environment and can be readily implemented with existing optical devices. The superior performance of these networks is verified by engaging three types of classic games, Tic-Tac-Toe, Super Mario Bros., and Car Racing. Finally, we present an experimental demonstration of playing Tic-Tac-Toe by leveraging diffractive optical networks based on a spatial light modulator. Our work represents a solid step forward in advancing diffractive optical networks, which promises a fundamental shift from the target-driven control of a pre-designed state for simple recognition or classification tasks to the high-level sensory capability of artificial intelligence. It may find exciting applications in autonomous driving, intelligent robots, and intelligent manufacturing.
    Persistent Homology of the Multiscale Clustering Filtration. (arXiv:2305.04281v2 [math.AT] UPDATED)
    In many applications in data clustering, it is desirable to find not just a single partition into clusters but a sequence of partitions describing the data at different scales, or levels of coarseness. A natural problem then is to analyse and compare the (not necessarily hierarchical) sequences of partitions that underpin such multiscale descriptions of data. Here, we introduce a filtration of abstract simplicial complexes, denoted the Multiscale Clustering Filtration (MCF), which encodes arbitrary patterns of cluster assignments across scales, and we prove that the MCF produces stable persistence diagrams. We then show that the zero-dimensional persistent homology of the MCF measures the degree of hierarchy in the sequence of partitions, and that the higher-dimensional persistent homology tracks the emergence and resolution of conflicts between cluster assignments across the sequence of partitions. To broaden the theoretical foundations of the MCF, we also provide an equivalent construction via a nerve complex filtration, and we show that in the hierarchical case, the MCF reduces to a Vietoris-Rips filtration of an ultrametric space. We briefly illustrate how the MCF can serve to characterise multiscale clustering structures in numerical experiments on synthetic data.
    Self-supervised learning unveils change in urban housing from street-level images. (arXiv:2309.11354v2 [cs.CV] UPDATED)
    Cities around the world face a critical shortage of affordable and decent housing. Despite its critical importance for policy, our ability to effectively monitor and track progress in urban housing is limited. Deep learning-based computer vision methods applied to street-level images have been successful in the measurement of socioeconomic and environmental inequalities but did not fully utilize temporal images to track urban change as time-varying labels are often unavailable. We used self-supervised methods to measure change in London using 15 million street images taken between 2008 and 2021. Our novel adaptation of Barlow Twins, Street2Vec, embeds urban structure while being invariant to seasonal and daily changes without manual annotations. It outperformed generic embeddings, successfully identified point-level change in London's housing supply from street-level images, and distinguished between major and minor change. This capability can provide timely information for urban planning and policy decisions toward more liveable, equitable, and sustainable cities.
    Environment-biased Feature Ranking for Novelty Detection Robustness. (arXiv:2309.12301v1 [cs.LG])
    We tackle the problem of robust novelty detection, where we aim to detect novelties in terms of semantic content while being invariant to changes in other, irrelevant factors. Specifically, we operate in a setup with multiple environments, where we determine the set of features that are associated more with the environments, rather than to the content relevant for the task. Thus, we propose a method that starts with a pretrained embedding and a multi-env setup and manages to rank the features based on their environment-focus. First, we compute a per-feature score based on the feature distribution variance between envs. Next, we show that by dropping the highly scored ones, we manage to remove spurious correlations and improve the overall performance by up to 6%, both in covariance and sub-population shift cases, both for a real and a synthetic benchmark, that we introduce for this task.
    CoDi: Co-evolving Contrastive Diffusion Models for Mixed-type Tabular Synthesis. (arXiv:2304.12654v2 [cs.LG] UPDATED)
    With growing attention to tabular data these days, the attempt to apply a synthetic table to various tasks has been expanded toward various scenarios. Owing to the recent advances in generative modeling, fake data generated by tabular data synthesis models become sophisticated and realistic. However, there still exists a difficulty in modeling discrete variables (columns) of tabular data. In this work, we propose to process continuous and discrete variables separately (but being conditioned on each other) by two diffusion models. The two diffusion models are co-evolved during training by reading conditions from each other. In order to further bind the diffusion models, moreover, we introduce a contrastive learning method with a negative sampling method. In our experiments with 11 real-world tabular datasets and 8 baseline methods, we prove the efficacy of the proposed method, called CoDi.
    On the different regimes of Stochastic Gradient Descent. (arXiv:2309.10688v2 [cs.LG] UPDATED)
    Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^*$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^*$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.
    Analysis and Comparison of Classification Metrics. (arXiv:2209.05355v4 [cs.LG] UPDATED)
    A variety of different performance metrics are commonly used in the machine learning literature for the evaluation of classification systems. Some of the most common ones for measuring quality of hard decisions are standard and balanced accuracy, standard and balanced error rate, F-beta score, and Matthews correlation coefficient (MCC). In this document, we review the definition of these and other metrics and compare them with the expected cost (EC), a metric introduced in every statistical learning course but rarely used in the machine learning literature. We show that both the standard and balanced error rates are special cases of the EC. Further, we show its relation with F-beta score and MCC and argue that EC is superior to these traditional metrics for being based on first principles from statistics, and for being more general, interpretable, and adaptable to any application scenario. The metrics mentioned above measure the quality of hard decisions. Yet, most modern classification systems output continuous scores for the classes which we may want to evaluate directly. Metrics for measuring the quality of system scores include the area under the ROC curve, equal error rate, cross-entropy, Brier score, and Bayes EC or Bayes risk, among others. The last three metrics are special cases of a family of metrics given by the expected value of proper scoring rules (PSRs). We review the theory behind these metrics, showing that they are a principled way to measure the quality of the posterior probabilities produced by a system. Finally, we show how to use these metrics to compute a system's calibration loss and compare this metric with the widely-used expected calibration error (ECE), arguing that calibration loss based on PSRs is superior to the ECE for being more interpretable, more general, and directly applicable to the multi-class case, among other reasons.
    Bridging the Gap: Learning Pace Synchronization for Open-World Semi-Supervised Learning. (arXiv:2309.11930v1 [cs.LG])
    In open-world semi-supervised learning, a machine learning model is tasked with uncovering novel categories from unlabeled data while maintaining performance on seen categories from labeled data. The central challenge is the substantial learning gap between seen and novel categories, as the model learns the former faster due to accurate supervisory information. To address this, we introduce 1) an adaptive margin loss based on estimated class distribution, which encourages a large negative margin for samples in seen classes, to synchronize learning paces, and 2) pseudo-label contrastive clustering, which pulls together samples which are likely from the same class in the output space, to enhance novel class discovery. Our extensive evaluations on multiple datasets demonstrate that existing models still hinder novel class learning, whereas our approach strikingly balances both seen and novel classes, achieving a remarkable 3% average accuracy increase on the ImageNet dataset compared to the prior state-of-the-art. Additionally, we find that fine-tuning the self-supervised pre-trained backbone significantly boosts performance over the default in prior literature. After our paper is accepted, we will release the code.
    Convergence and Recovery Guarantees of Unsupervised Neural Networks for Inverse Problems. (arXiv:2309.12128v1 [cs.LG])
    Neural networks have become a prominent approach to solve inverse problems in recent years. While a plethora of such methods was developed to solve inverse problems empirically, we are still lacking clear theoretical guarantees for these methods. On the other hand, many works proved convergence to optimal solutions of neural networks in a more general setting using overparametrization as a way to control the Neural Tangent Kernel. In this work we investigate how to bridge these two worlds and we provide deterministic convergence and recovery guarantees for the class of unsupervised feedforward multilayer neural networks trained to solve inverse problems. We also derive overparametrization bounds under which a two-layers Deep Inverse Prior network with smooth activation function will benefit from our guarantees.
    Robust Approximation Algorithms for Non-monotone $k$-Submodular Maximization under a Knapsack Constraint. (arXiv:2309.12025v1 [cs.DS])
    The problem of non-monotone $k$-submodular maximization under a knapsack constraint ($\kSMK$) over the ground set size $n$ has been raised in many applications in machine learning, such as data summarization, information propagation, etc. However, existing algorithms for the problem are facing questioning of how to overcome the non-monotone case and how to fast return a good solution in case of the big size of data. This paper introduces two deterministic approximation algorithms for the problem that competitively improve the query complexity of existing algorithms. Our first algorithm, $\LAA$, returns an approximation ratio of $1/19$ within $O(nk)$ query complexity. The second one, $\RLA$, improves the approximation ratio to $1/5-\epsilon$ in $O(nk)$ queries, where $\epsilon$ is an input parameter. Our algorithms are the first ones that provide constant approximation ratios within only $O(nk)$ query complexity for the non-monotone objective. They, therefore, need fewer the number of queries than state-of-the-the-art ones by a factor of $\Omega(\log n)$. Besides the theoretical analysis, we have evaluated our proposed ones with several experiments in some instances: Influence Maximization and Sensor Placement for the problem. The results confirm that our algorithms ensure theoretical quality as the cutting-edge techniques and significantly reduce the number of queries.
    PIE: Simulating Disease Progression via Progressive Image Editing. (arXiv:2309.11745v1 [eess.IV])
    Disease progression simulation is a crucial area of research that has significant implications for clinical diagnosis, prognosis, and treatment. One major challenge in this field is the lack of continuous medical imaging monitoring of individual patients over time. To address this issue, we develop a novel framework termed Progressive Image Editing (PIE) that enables controlled manipulation of disease-related image features, facilitating precise and realistic disease progression simulation. Specifically, we leverage recent advancements in text-to-image generative models to simulate disease progression accurately and personalize it for each patient. We theoretically analyze the iterative refining process in our framework as a gradient descent with an exponentially decayed learning rate. To validate our framework, we conduct experiments in three medical imaging domains. Our results demonstrate the superiority of PIE over existing methods such as Stable Diffusion Walk and Style-Based Manifold Extrapolation based on CLIP score (Realism) and Disease Classification Confidence (Alignment). Our user study collected feedback from 35 veteran physicians to assess the generated progressions. Remarkably, 76.2% of the feedback agrees with the fidelity of the generated progressions. To our best knowledge, PIE is the first of its kind to generate disease progression images meeting real-world standards. It is a promising tool for medical research and clinical practice, potentially allowing healthcare providers to model disease trajectories over time, predict future treatment responses, and improve patient outcomes.
    Federated Learning with Neural Graphical Models. (arXiv:2309.11680v1 [cs.LG])
    Federated Learning (FL) addresses the need to create models based on proprietary data in such a way that multiple clients retain exclusive control over their data, while all benefit from improved model accuracy due to pooled resources. Recently proposed Neural Graphical Models (NGMs) are Probabilistic Graphical models that utilize the expressive power of neural networks to learn complex non-linear dependencies between the input features. They learn to capture the underlying data distribution and have efficient algorithms for inference and sampling. We develop a FL framework which maintains a global NGM model that learns the averaged information from the local NGM models while keeping the training data within the client's environment. Our design, FedNGMs, avoids the pitfalls and shortcomings of neuron matching frameworks like Federated Matched Averaging that suffers from model parameter explosion. Our global model size remains constant throughout the process. In the cases where clients have local variables that are not part of the combined global distribution, we propose a `Stitching' algorithm, which personalizes the global NGM models by merging the additional variables using the client's data. FedNGM is robust to data heterogeneity, large number of participants, and limited communication bandwidth.
    Stock Market Sentiment Classification and Backtesting via Fine-tuned BERT. (arXiv:2309.11979v1 [q-fin.CP])
    With the rapid development of big data and computing devices, low-latency automatic trading platforms based on real-time information acquisition have become the main components of the stock trading market, so the topic of quantitative trading has received widespread attention. And for non-strongly efficient trading markets, human emotions and expectations always dominate market trends and trading decisions. Therefore, this paper starts from the theory of emotion, taking East Money as an example, crawling user comment titles data from its corresponding stock bar and performing data cleaning. Subsequently, a natural language processing model BERT was constructed, and the BERT model was fine-tuned using existing annotated data sets. The experimental results show that the fine-tuned model has different degrees of performance improvement compared to the original model and the baseline model. Subsequently, based on the above model, the user comment data crawled is labeled with emotional polarity, and the obtained label information is combined with the Alpha191 model to participate in regression, and significant regression results are obtained. Subsequently, the regression model is used to predict the average price change for the next five days, and use it as a signal to guide automatic trading. The experimental results show that the incorporation of emotional factors increased the return rate by 73.8\% compared to the baseline during the trading period, and by 32.41\% compared to the original alpha191 model. Finally, we discuss the advantages and disadvantages of incorporating emotional factors into quantitative trading, and give possible directions for further research in the future.
    Safe Hierarchical Reinforcement Learning for CubeSat Task Scheduling Based on Energy Consumption. (arXiv:2309.12004v1 [cs.LG])
    This paper presents a Hierarchical Reinforcement Learning methodology tailored for optimizing CubeSat task scheduling in Low Earth Orbits (LEO). Incorporating a high-level policy for global task distribution and a low-level policy for real-time adaptations as a safety mechanism, our approach integrates the Similarity Attention-based Encoder (SABE) for task prioritization and an MLP estimator for energy consumption forecasting. Integrating this mechanism creates a safe and fault-tolerant system for CubeSat task scheduling. Simulation results validate the Hierarchical Reinforcement Learning superior convergence and task success rate, outperforming both the MADDPG model and traditional random scheduling across multiple CubeSat configurations.  ( 2 min )
    Predictability and Comprehensibility in Post-Hoc XAI Methods: A User-Centered Analysis. (arXiv:2309.11987v1 [cs.LG])
    Post-hoc explainability methods aim to clarify predictions of black-box machine learning models. However, it is still largely unclear how well users comprehend the provided explanations and whether these increase the users ability to predict the model behavior. We approach this question by conducting a user study to evaluate comprehensibility and predictability in two widely used tools: LIME and SHAP. Moreover, we investigate the effect of counterfactual explanations and misclassifications on users ability to understand and predict the model behavior. We find that the comprehensibility of SHAP is significantly reduced when explanations are provided for samples near a model's decision boundary. Furthermore, we find that counterfactual explanations and misclassifications can significantly increase the users understanding of how a machine learning model is making decisions. Based on our findings, we also derive design recommendations for future post-hoc explainability methods with increased comprehensibility and predictability.
    SupeRBNN: Randomized Binary Neural Network Using Adiabatic Superconductor Josephson Devices. (arXiv:2309.12212v1 [cs.ET])
    Adiabatic Quantum-Flux-Parametron (AQFP) is a superconducting logic with extremely high energy efficiency. By employing the distinct polarity of current to denote logic `0' and `1', AQFP devices serve as excellent carriers for binary neural network (BNN) computations. Although recent research has made initial strides toward developing an AQFP-based BNN accelerator, several critical challenges remain, preventing the design from being a comprehensive solution. In this paper, we propose SupeRBNN, an AQFP-based randomized BNN acceleration framework that leverages software-hardware co-optimization to eventually make the AQFP devices a feasible solution for BNN acceleration. Specifically, we investigate the randomized behavior of the AQFP devices and analyze the impact of crossbar size on current attenuation, subsequently formulating the current amplitude into the values suitable for use in BNN computation. To tackle the accumulation problem and improve overall hardware performance, we propose a stochastic computing-based accumulation module and a clocking scheme adjustment-based circuit optimization method. We validate our SupeRBNN framework across various datasets and network architectures, comparing it with implementations based on different technologies, including CMOS, ReRAM, and superconducting RSFQ/ERSFQ. Experimental results demonstrate that our design achieves an energy efficiency of approximately 7.8x10^4 times higher than that of the ReRAM-based BNN framework while maintaining a similar level of model accuracy. Furthermore, when compared with superconductor-based counterparts, our framework demonstrates at least two orders of magnitude higher energy efficiency.  ( 3 min )
    BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model. (arXiv:2309.11568v1 [cs.AI])
    We introduce the Bittensor Language Model, called "BTLM-3B-8K", a new state-of-the-art 3 billion parameter open-source language model. BTLM-3B-8K was trained on 627B tokens from the SlimPajama dataset with a mixture of 2,048 and 8,192 context lengths. BTLM-3B-8K outperforms all existing 3B parameter models by 2-5.5% across downstream tasks. BTLM-3B-8K is even competitive with some 7B parameter models. Additionally, BTLM-3B-8K provides excellent long context performance, outperforming MPT-7B-8K and XGen-7B-8K on tasks up to 8,192 context length. We trained the model on a cleaned and deduplicated SlimPajama dataset; aggressively tuned the \textmu P hyperparameters and schedule; used ALiBi position embeddings; and adopted the SwiGLU nonlinearity. On Hugging Face, the most popular models have 7B parameters, indicating that users prefer the quality-size ratio of 7B models. Compacting the 7B parameter model to one with 3B parameters, with little performance impact, is an important milestone. BTLM-3B-8K needs only 3GB of memory with 4-bit precision and takes 2.5x less inference compute than 7B models, helping to open up access to a powerful language model on mobile and edge devices. BTLM-3B-8K is available under an Apache 2.0 license on Hugging Face: https://huggingface.co/cerebras/btlm-3b-8k-base.
    AdBooster: Personalized Ad Creative Generation using Stable Diffusion Outpainting. (arXiv:2309.11507v1 [cs.IR])
    In digital advertising, the selection of the optimal item (recommendation) and its best creative presentation (creative optimization) have traditionally been considered separate disciplines. However, both contribute significantly to user satisfaction, underpinning our assumption that it relies on both an item's relevance and its presentation, particularly in the case of visual creatives. In response, we introduce the task of {\itshape Generative Creative Optimization (GCO)}, which proposes the use of generative models for creative generation that incorporate user interests, and {\itshape AdBooster}, a model for personalized ad creatives based on the Stable Diffusion outpainting architecture. This model uniquely incorporates user interests both during fine-tuning and at generation time. To further improve AdBooster's performance, we also introduce an automated data augmentation pipeline. Through our experiments on simulated data, we validate AdBooster's effectiveness in generating more relevant creatives than default product images, showing its potential of enhancing user engagement.
    State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards. (arXiv:2102.11941v2 [cs.LG] UPDATED)
    A common formulation of constrained reinforcement learning involves multiple rewards that must individually accumulate to given thresholds. In this class of problems, we show a simple example in which the desired optimal policy cannot be induced by any weighted linear combination of rewards. Hence, there exist constrained reinforcement learning problems for which neither regularized nor classical primal-dual methods yield optimal policies. This work addresses this shortcoming by augmenting the state with Lagrange multipliers and reinterpreting primal-dual methods as the portion of the dynamics that drives the multipliers evolution. This approach provides a systematic state augmentation procedure that is guaranteed to solve reinforcement learning problems with constraints. Thus, as we illustrate by an example, while previous methods can fail at finding optimal policies, running the dual dynamics while executing the augmented policy yields an algorithm that provably samples actions from the optimal policy.  ( 2 min )
    Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization. (arXiv:2307.02108v2 [cs.LG] UPDATED)
    Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.
    SAM-OCTA: A Fine-Tuning Strategy for Applying Foundation Model to OCTA Image Segmentation Tasks. (arXiv:2309.11758v1 [cs.CV])
    In the analysis of optical coherence tomography angiography (OCTA) images, the operation of segmenting specific targets is necessary. Existing methods typically train on supervised datasets with limited samples (approximately a few hundred), which can lead to overfitting. To address this, the low-rank adaptation technique is adopted for foundation model fine-tuning and proposed corresponding prompt point generation strategies to process various segmentation tasks on OCTA datasets. This method is named SAM-OCTA and has been experimented on the publicly available OCTA-500 dataset. While achieving state-of-the-art performance metrics, this method accomplishes local vessel segmentation as well as effective artery-vein segmentation, which was not well-solved in previous works. The code is available at: https://github.com/ShellRedia/SAM-OCTA.
    Graph Neural Networks for the Offline Nanosatellite Task Scheduling Problem. (arXiv:2303.13773v2 [cs.LG] UPDATED)
    This study investigates how to schedule nanosatellite tasks more efficiently using Graph Neural Networks (GNNs). In the Offline Nanosatellite Task Scheduling (ONTS) problem, the goal is to find the optimal schedule for tasks to be carried out in orbit while taking into account Quality-of-Service (QoS) considerations such as priority, minimum and maximum activation events, execution time-frames, periods, and execution windows, as well as constraints on the satellite's power resources and the complexity of energy harvesting and management. The ONTS problem has been approached using conventional mathematical formulations and exact methods, but their applicability to challenging cases of the problem is limited. This study examines the use of GNNs in this context, which has been effectively applied to optimization problems such as the traveling salesman, scheduling, and facility placement problems. More specifically, we investigate whether GNNs can learn the complex structure of the ONTS problem with respect to feasibility and optimality of candidate solutions. Furthermore, we evaluate using GNN-based heuristic solutions to provide better solutions (w.r.t. the objective value) to the ONTS problem and reduce the optimization cost. Our experiments show that GNNs are not only able to learn feasibility and optimality for instances of the ONTS problem, but they can generalize to harder instances than those seen during training. Furthermore, the GNN-based heuristics improved the expected objective value of the best solution found under the time limit in 45%, and reduced the expected time to find a feasible solution in 35%, when compared to the SCIP (Solving Constraint Integer Programs) solver in its off-the-shelf configuration
    Crowdotic: A Privacy-Preserving Hospital Waiting Room Crowd Density Estimation with Non-speech Audio. (arXiv:2309.10280v2 [cs.SD] UPDATED)
    Privacy-preserving crowd density analysis finds application across a wide range of scenarios, substantially enhancing smart building operation and management while upholding privacy expectations in various spaces. We propose a non-speech audio-based approach for crowd analytics, leveraging a transformer-based model. Our results demonstrate that non-speech audio alone can be used to conduct such analysis with remarkable accuracy. To the best of our knowledge, this is the first time when non-speech audio signals are proposed for predicting occupancy. As far as we know, there has been no other similar approach of its kind prior to this. To accomplish this, we deployed our sensor-based platform in the waiting room of a large hospital with IRB approval over a period of several months to capture non-speech audio and thermal images for the training and evaluation of our models. The proposed non-speech-based approach outperformed the thermal camera-based model and all other baselines. In addition to demonstrating superior performance without utilizing speech audio, we conduct further analysis using differential privacy techniques to provide additional privacy guarantees. Overall, our work demonstrates the viability of employing non-speech audio data for accurate occupancy estimation, while also ensuring the exclusion of speech-related content and providing robust privacy protections through differential privacy guarantees.
    $\lambda$-AC: Learning latent decision-aware models for reinforcement learning in continuous state-spaces. (arXiv:2306.17366v2 [cs.LG] UPDATED)
    The idea of decision-aware model learning, that models should be accurate where it matters for decision-making, has gained prominence in model-based reinforcement learning. While promising theoretical results have been established, the empirical performance of algorithms leveraging a decision-aware loss has been lacking, especially in continuous control problems. In this paper, we present a study on the necessary components for decision-aware reinforcement learning models and we showcase design choices that enable well-performing algorithms. To this end, we provide a theoretical and empirical investigation into prominent algorithmic ideas in the field. We highlight that empirical design decisions established in the MuZero line of works are vital to achieving good performance for related algorithms, and we showcase differences in behavior between different instantiations of value-aware algorithms in stochastic environments. Using these insights, we propose the Latent Model-Based Decision-Aware Actor-Critic framework ($\lambda$-AC) for decision-aware model-based reinforcement learning in continuous state-spaces and highlight important design choices in different environments.
    Optimal Conditional Inference in Adaptive Experiments. (arXiv:2309.12162v1 [stat.ME])
    We study batched bandit experiments and consider the problem of inference conditional on the realized stopping time, assignment probabilities, and target parameter, where all of these may be chosen adaptively using information up to the last batch of the experiment. Absent further restrictions on the experiment, we show that inference using only the results of the last batch is optimal. When the adaptive aspects of the experiment are known to be location-invariant, in the sense that they are unchanged when we shift all batch-arm means by a constant, we show that there is additional information in the data, captured by one additional linear function of the batch-arm means. In the more restrictive case where the stopping time, assignment probabilities, and target parameter are known to depend on the data only through a collection of polyhedral events, we derive computationally tractable and optimal conditional inference procedures.  ( 2 min )
    t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. (arXiv:2309.12237v1 [cs.CR])
    Presentation attack (spoofing) detection (PAD) typically operates alongside biometric verification to improve reliablity in the face of spoofing attacks. Even though the two sub-systems operate in tandem to solve the single task of reliable biometric verification, they address different detection tasks and are hence typically evaluated separately. Evidence shows that this approach is suboptimal. We introduce a new metric for the joint evaluation of PAD solutions operating in situ with biometric verification. In contrast to the tandem detection cost function proposed recently, the new tandem equal error rate (t-EER) is parameter free. The combination of two classifiers nonetheless leads to a \emph{set} of operating points at which false alarm and miss rates are equal and also dependent upon the prevalence of attacks. We therefore introduce the \emph{concurrent} t-EER, a unique operating point which is invariable to the prevalence of attacks. Using both modality (and even application) agnostic simulated scores, as well as real scores for a voice biometrics application, we demonstrate application of the t-EER to a wide range of biometric system evaluations under attack. The proposed approach is a strong candidate metric for the tandem evaluation of PAD systems and biometric comparators.
    ForceSight: Text-Guided Mobile Manipulation with Visual-Force Goals. (arXiv:2309.12312v1 [cs.RO])
    We present ForceSight, a system for text-guided mobile manipulation that predicts visual-force goals using a deep neural network. Given a single RGBD image combined with a text prompt, ForceSight determines a target end-effector pose in the camera frame (kinematic goal) and the associated forces (force goal). Together, these two components form a visual-force goal. Prior work has demonstrated that deep models outputting human-interpretable kinematic goals can enable dexterous manipulation by real robots. Forces are critical to manipulation, yet have typically been relegated to lower-level execution in these systems. When deployed on a mobile manipulator equipped with an eye-in-hand RGBD camera, ForceSight performed tasks such as precision grasps, drawer opening, and object handovers with an 81% success rate in unseen environments with object instances that differed significantly from the training data. In a separate experiment, relying exclusively on visual servoing and ignoring force goals dropped the success rate from 90% to 45%, demonstrating that force goals can significantly enhance performance. The appendix, videos, code, and trained models are available at https://force-sight.github.io/.
    A Multi-label Classification Approach to Increase Expressivity of EMG-based Gesture Recognition. (arXiv:2309.12217v1 [eess.SP])
    Objective: The objective of the study is to efficiently increase the expressivity of surface electromyography-based (sEMG) gesture recognition systems. Approach: We use a problem transformation approach, in which actions were subset into two biomechanically independent components - a set of wrist directions and a set of finger modifiers. To maintain fast calibration time, we train models for each component using only individual gestures, and extrapolate to the full product space of combination gestures by generating synthetic data. We collected a supervised dataset with high-confidence ground truth labels in which subjects performed combination gestures while holding a joystick, and conducted experiments to analyze the impact of model architectures, classifier algorithms, and synthetic data generation strategies on the performance of the proposed approach. Main Results: We found that a problem transformation approach using a parallel model architecture in combination with a non-linear classifier, along with restricted synthetic data generation, shows promise in increasing the expressivity of sEMG-based gestures with a short calibration time. Significance: sEMG-based gesture recognition has applications in human-computer interaction, virtual reality, and the control of robotic and prosthetic devices. Existing approaches require exhaustive model calibration. The proposed approach increases expressivity without requiring users to demonstrate all combination gesture classes. Our results may be extended to larger gesture vocabularies and more complicated model architectures.  ( 2 min )
    Smooth ECE: Principled Reliability Diagrams via Kernel Smoothing. (arXiv:2309.12236v1 [cs.LG])
    Calibration measures and reliability diagrams are two fundamental tools for measuring and interpreting the calibration of probabilistic predictors. Calibration measures quantify the degree of miscalibration, and reliability diagrams visualize the structure of this miscalibration. However, the most common constructions of reliability diagrams and calibration measures -- binning and ECE -- both suffer from well-known flaws (e.g. discontinuity). We show that a simple modification fixes both constructions: first smooth the observations using an RBF kernel, then compute the Expected Calibration Error (ECE) of this smoothed function. We prove that with a careful choice of bandwidth, this method yields a calibration measure that is well-behaved in the sense of (B{\l}asiok, Gopalan, Hu, and Nakkiran 2023a) -- a consistent calibration measure. We call this measure the SmoothECE. Moreover, the reliability diagram obtained from this smoothed function visually encodes the SmoothECE, just as binned reliability diagrams encode the BinnedECE. We also provide a Python package with simple, hyperparameter-free methods for measuring and plotting calibration: `pip install relplot\`.
    Uplift vs. predictive modeling: a theoretical analysis. (arXiv:2309.12036v1 [cs.LG])
    Despite the growing popularity of machine-learning techniques in decision-making, the added value of causal-oriented strategies with respect to pure machine-learning approaches has rarely been quantified in the literature. These strategies are crucial for practitioners in various domains, such as marketing, telecommunications, health care and finance. This paper presents a comprehensive treatment of the subject, starting from firm theoretical foundations and highlighting the parameters that influence the performance of the uplift and predictive approaches. The focus of the paper is on a binary outcome case and a binary action, and the paper presents a theoretical analysis of uplift modeling, comparing it with the classical predictive approach. The main research contributions of the paper include a new formulation of the measure of profit, a formal proof of the convergence of the uplift curve to the measure of profit ,and an illustration, through simulations, of the conditions under which predictive approaches still outperform uplift modeling. We show that the mutual information between the features and the outcome plays a significant role, along with the variance of the estimators, the distribution of the potential outcomes and the underlying costs and benefits of the treatment and the outcome.  ( 2 min )
    Enabling Quartile-based Estimated-Mean Gradient Aggregation As Baseline for Federated Image Classifications. (arXiv:2309.12267v1 [cs.CR])
    Federated Learning (FL) has revolutionized how we train deep neural networks by enabling decentralized collaboration while safeguarding sensitive data and improving model performance. However, FL faces two crucial challenges: the diverse nature of data held by individual clients and the vulnerability of the FL system to security breaches. This paper introduces an innovative solution named Estimated Mean Aggregation (EMA) that not only addresses these challenges but also provides a fundamental reference point as a $\mathsf{baseline}$ for advanced aggregation techniques in FL systems. EMA's significance lies in its dual role: enhancing model security by effectively handling malicious outliers through trimmed means and uncovering data heterogeneity to ensure that trained models are adaptable across various client datasets. Through a wealth of experiments, EMA consistently demonstrates high accuracy and area under the curve (AUC) compared to alternative methods, establishing itself as a robust baseline for evaluating the effectiveness and security of FL aggregation methods. EMA's contributions thus offer a crucial step forward in advancing the efficiency, security, and versatility of decentralized deep learning in the context of FL.  ( 2 min )
    LongLoRA: Efficient Fine-tuning of Long-Context Large Language Models. (arXiv:2309.12307v1 [cs.CL])
    We present LongLoRA, an efficient fine-tuning approach that extends the context sizes of pre-trained large language models (LLMs), with limited computation cost. Typically, training LLMs with long context sizes is computationally expensive, requiring extensive training hours and GPU resources. For example, training on the context length of 8192 needs 16x computational costs in self-attention layers as that of 2048. In this paper, we speed up the context extension of LLMs in two aspects. On the one hand, although dense global attention is needed during inference, fine-tuning the model can be effectively and efficiently done by sparse local attention. The proposed shift short attention effectively enables context extension, leading to non-trivial computation saving with similar performance to fine-tuning with vanilla attention. Particularly, it can be implemented with only two lines of code in training, while being optional in inference. On the other hand, we revisit the parameter-efficient fine-tuning regime for context expansion. Notably, we find that LoRA for context extension works well under the premise of trainable embedding and normalization. LongLoRA demonstrates strong empirical results on various tasks on LLaMA2 models from 7B/13B to 70B. LongLoRA adopts LLaMA2 7B from 4k context to 100k, or LLaMA2 70B to 32k on a single 8x A100 machine. LongLoRA extends models' context while retaining their original architectures, and is compatible with most existing techniques, like FlashAttention-2. In addition, to make LongLoRA practical, we collect a dataset, LongQA, for supervised fine-tuning. It contains more than 3k long context question-answer pairs.  ( 3 min )
    Brain Tumor Detection Using Deep Learning Approaches. (arXiv:2309.12193v1 [eess.IV])
    Brain tumors are collections of abnormal cells that can develop into masses or clusters. Because they have the potential to infiltrate other tissues, they pose a risk to the patient. The main imaging technique used, MRI, may be able to identify a brain tumor with accuracy. The fast development of Deep Learning methods for use in computer vision applications has been facilitated by a vast amount of training data and improvements in model construction that offer better approximations in a supervised setting. The need for these approaches has been the main driver of this expansion. Deep learning methods have shown promise in improving the precision of brain tumor detection and classification using magnetic resonance imaging (MRI). The study on the use of deep learning techniques, especially ResNet50, for brain tumor identification is presented in this abstract. As a result, this study investigates the possibility of automating the detection procedure using deep learning techniques. In this study, I utilized five transfer learning models which are VGG16, VGG19, DenseNet121, ResNet50 and YOLO V4 where ResNet50 provide the best or highest accuracy 99.54%. The goal of the study is to guide researchers and medical professionals toward powerful brain tumor detecting systems by employing deep learning approaches by way of this evaluation and analysis.  ( 2 min )
    Boolformer: Symbolic Regression of Logic Functions with Transformers. (arXiv:2309.12207v1 [cs.LG])
    In this work, we introduce Boolformer, the first Transformer architecture trained to perform end-to-end symbolic regression of Boolean functions. First, we show that it can predict compact formulas for complex functions which were not seen during training, when provided a clean truth table. Then, we demonstrate its ability to find approximate expressions when provided incomplete and noisy observations. We evaluate the Boolformer on a broad set of real-world binary classification datasets, demonstrating its potential as an interpretable alternative to classic machine learning methods. Finally, we apply it to the widespread task of modelling the dynamics of gene regulatory networks. Using a recent benchmark, we show that Boolformer is competitive with state-of-the art genetic algorithms with a speedup of several orders of magnitude. Our code and models are available publicly.  ( 2 min )
    A Survey on Transformers in Reinforcement Learning. (arXiv:2301.03044v3 [cs.LG] UPDATED)
    Transformer has been considered the dominating neural architecture in NLP and CV, mostly under supervised settings. Recently, a similar surge of using Transformers has appeared in the domain of reinforcement learning (RL), but it is faced with unique design choices and challenges brought by the nature of RL. However, the evolution of Transformers in RL has not yet been well unraveled. In this paper, we seek to systematically review motivations and progress on using Transformers in RL, provide a taxonomy on existing works, discuss each sub-field, and summarize future prospects.  ( 2 min )
    Learning End-to-End Channel Coding with Diffusion Models. (arXiv:2309.10505v2 [cs.IT] UPDATED)
    The training of neural encoders via deep learning necessitates a differentiable channel model due to the backpropagation algorithm. This requirement can be sidestepped by approximating either the channel distribution or its gradient through pilot signals in real-world scenarios. The initial approach draws upon the latest advancements in image generation, utilizing generative adversarial networks (GANs) or their enhanced variants to generate channel distributions. In this paper, we address this channel approximation challenge with diffusion models, which have demonstrated high sample quality in image generation. We offer an end-to-end channel coding framework underpinned by diffusion models and propose an efficient training algorithm. Our simulations with various channel models establish that our diffusion models learn the channel distribution accurately, thereby achieving near-optimal end-to-end symbol error rates (SERs). We also note a significant advantage of diffusion models: A robust generalization capability in high signal-to-noise ratio regions, in contrast to GAN variants that suffer from error floor. Furthermore, we examine the trade-off between sample quality and sampling speed, when an accelerated sampling algorithm is deployed, and investigate the effect of the noise scheduling on this trade-off. With an apt choice of noise scheduling, sampling time can be significantly reduced with a minor increase in SER.  ( 2 min )
    Semantic-aware Transmission Scheduling: a Monotonicity-driven Deep Reinforcement Learning Approach. (arXiv:2305.13706v2 [cs.LG] UPDATED)
    For cyber-physical systems in the 6G era, semantic communications connecting distributed devices for dynamic control and remote state estimation are required to guarantee application-level performance, not merely focus on communication-centric performance. Semantics here is a measure of the usefulness of information transmissions. Semantic-aware transmission scheduling of a large system often involves a large decision-making space, and the optimal policy cannot be obtained by existing algorithms effectively. In this paper, we first investigate the fundamental properties of the optimal semantic-aware scheduling policy and then develop advanced deep reinforcement learning (DRL) algorithms by leveraging the theoretical guidelines. Our numerical results show that the proposed algorithms can substantially reduce training time and enhance training performance compared to benchmark algorithms.
    Early diagnosis of autism spectrum disorder using machine learning approaches. (arXiv:2309.11646v1 [cs.LG])
    Autistic Spectrum Disorder (ASD) is a neurological disease characterized by difficulties with social interaction, communication, and repetitive activities. The severity of these difficulties varies, and those with this diagnosis face unique challenges. While its primary origin lies in genetics, identifying and addressing it early can contribute to the enhancement of the condition. In recent years, machine learning-driven intelligent diagnosis has emerged as a supplement to conventional clinical approaches, aiming to address the potential drawbacks of time-consuming and costly traditional methods. In this work, we utilize different machine learning algorithms to find the most significant traits responsible for ASD and to automate the diagnostic process. We study six classification models to see which model works best to identify ASD and also study five popular clustering methods to get a meaningful insight of these ASD datasets. To find the best classifier for these binary datasets, we evaluate the models using accuracy, precision, recall, specificity, F1-score, AUC, kappa and log loss metrics. Our evaluation demonstrates that five out of the six selected models perform exceptionally, achieving a 100% accuracy rate on the ASD datasets when hyperparameters are meticulously tuned for each model. As almost all classification models are able to get 100% accuracy, we become interested in observing the underlying insights of the datasets by implementing some popular clustering algorithms on these datasets. We calculate Normalized Mutual Information (NMI), Adjusted Rand Index (ARI) & Silhouette Coefficient (SC) metrics to select the best clustering models. Our evaluation finds that spectral clustering outperforms all other benchmarking clustering models in terms of NMI & ARI metrics and it also demonstrates comparability to the optimal SC achieved by k-means.
    Grassmann Manifold Flows for Stable Shape Generation. (arXiv:2211.02900v2 [cs.LG] UPDATED)
    Recently, studies on machine learning have focused on methods that use symmetry implicit in a specific manifold as an inductive bias. Grassmann manifolds provide the ability to handle fundamental shapes represented as shape spaces, enabling stable shape analysis. In this paper, we present a novel approach in which we establish the theoretical foundations for learning distributions on the Grassmann manifold via continuous normalization flows, with the explicit goal of generating stable shapes. Our approach facilitates more robust generation by effectively eliminating the influence of extraneous transformations, such as rotations and inversions, through learning and generating within a Grassmann manifolds designed to accommodate the essential shape information of the object. The experimental results indicated that the proposed method can generate high-quality samples by capturing the data structure. Furthermore, the proposed method significantly outperformed state-of-the-art methods in terms of the log-likelihood or evidence lower bound. The results obtained are expected to stimulate further research in this field, leading to advances for stable shape generation and analysis.
    Dynamic Hypergraph Structure Learning for Traffic Flow Forecasting. (arXiv:2309.12028v1 [cs.LG])
    This paper studies the problem of traffic flow forecasting, which aims to predict future traffic conditions on the basis of road networks and traffic conditions in the past. The problem is typically solved by modeling complex spatio-temporal correlations in traffic data using spatio-temporal graph neural networks (GNNs). However, the performance of these methods is still far from satisfactory since GNNs usually have limited representation capacity when it comes to complex traffic networks. Graphs, by nature, fall short in capturing non-pairwise relations. Even worse, existing methods follow the paradigm of message passing that aggregates neighborhood information linearly, which fails to capture complicated spatio-temporal high-order interactions. To tackle these issues, in this paper, we propose a novel model named Dynamic Hypergraph Structure Learning (DyHSL) for traffic flow prediction. To learn non-pairwise relationships, our DyHSL extracts hypergraph structural information to model dynamics in the traffic networks, and updates each node representation by aggregating messages from its associated hyperedges. Additionally, to capture high-order spatio-temporal relations in the road network, we introduce an interactive graph convolution block, which further models the neighborhood interaction for each node. Finally, we integrate these two views into a holistic multi-scale correlation extraction module, which conducts temporal pooling with different scales to model different temporal patterns. Extensive experiments on four popular traffic benchmark datasets demonstrate the effectiveness of our proposed DyHSL compared with a broad range of competing baselines.  ( 3 min )
    Self-Supervised Contrastive Learning for Robust Audio-Sheet Music Retrieval Systems. (arXiv:2309.12134v1 [cs.SD])
    Linking sheet music images to audio recordings remains a key problem for the development of efficient cross-modal music retrieval systems. One of the fundamental approaches toward this task is to learn a cross-modal embedding space via deep neural networks that is able to connect short snippets of audio and sheet music. However, the scarcity of annotated data from real musical content affects the capability of such methods to generalize to real retrieval scenarios. In this work, we investigate whether we can mitigate this limitation with self-supervised contrastive learning, by exposing a network to a large amount of real music data as a pre-training step, by contrasting randomly augmented views of snippets of both modalities, namely audio and sheet images. Through a number of experiments on synthetic and real piano data, we show that pre-trained models are able to retrieve snippets with better precision in all scenarios and pre-training configurations. Encouraged by these results, we employ the snippet embeddings in the higher-level task of cross-modal piece identification and conduct more experiments on several retrieval configurations. In this task, we observe that the retrieval quality improves from 30% up to 100% when real music data is present. We then conclude by arguing for the potential of self-supervised contrastive learning for alleviating the annotated data scarcity in multi-modal music retrieval models.
    Unsupervised Domain Adaptation for Self-Driving from Past Traversal Features. (arXiv:2309.12140v1 [cs.CV])
    The rapid development of 3D object detection systems for self-driving cars has significantly improved accuracy. However, these systems struggle to generalize across diverse driving environments, which can lead to safety-critical failures in detecting traffic participants. To address this, we propose a method that utilizes unlabeled repeated traversals of multiple locations to adapt object detectors to new driving environments. By incorporating statistics computed from repeated LiDAR scans, we guide the adaptation process effectively. Our approach enhances LiDAR-based detection models using spatial quantized historical features and introduces a lightweight regression head to leverage the statistics for feature regularization. Additionally, we leverage the statistics for a novel self-training process to stabilize the training. The framework is detector model-agnostic and experiments on real-world datasets demonstrate significant improvements, achieving up to a 20-point performance gain, especially in detecting pedestrians and distant objects. Code is available at https://github.com/zhangtravis/Hist-DA.
    Potential and limitations of random Fourier features for dequantizing quantum machine learning. (arXiv:2309.11647v1 [quant-ph])
    Quantum machine learning is arguably one of the most explored applications of near-term quantum devices. Much focus has been put on notions of variational quantum machine learning where parameterized quantum circuits (PQCs) are used as learning models. These PQC models have a rich structure which suggests that they might be amenable to efficient dequantization via random Fourier features (RFF). In this work, we establish necessary and sufficient conditions under which RFF does indeed provide an efficient dequantization of variational quantum machine learning for regression. We build on these insights to make concrete suggestions for PQC architecture design, and to identify structures which are necessary for a regression problem to admit a potential quantum advantage via PQC based optimization.
    TMac: Temporal Multi-Modal Graph Learning for Acoustic Event Classification. (arXiv:2309.11845v1 [cs.SD])
    Audiovisual data is everywhere in this digital age, which raises higher requirements for the deep learning models developed on them. To well handle the information of the multi-modal data is the key to a better audiovisual modal. We observe that these audiovisual data naturally have temporal attributes, such as the time information for each frame in the video. More concretely, such data is inherently multi-modal according to both audio and visual cues, which proceed in a strict chronological order. It indicates that temporal information is important in multi-modal acoustic event modeling for both intra- and inter-modal. However, existing methods deal with each modal feature independently and simply fuse them together, which neglects the mining of temporal relation and thus leads to sub-optimal performance. With this motivation, we propose a Temporal Multi-modal graph learning method for Acoustic event Classification, called TMac, by modeling such temporal information via graph learning techniques. In particular, we construct a temporal graph for each acoustic event, dividing its audio data and video data into multiple segments. Each segment can be considered as a node, and the temporal relationships between nodes can be considered as timestamps on their edges. In this case, we can smoothly capture the dynamic information in intra-modal and inter-modal. Several experiments are conducted to demonstrate TMac outperforms other SOTA models in performance. Our code is available at https://github.com/MGitHubL/TMac.
    An Efficient Consolidation of Word Embedding and Deep Learning Techniques for Classifying Anticancer Peptides: FastText+BiLSTM. (arXiv:2309.12058v1 [cs.LG])
    Anticancer peptides (ACPs) are a group of peptides that exhibite antineoplastic properties. The utilization of ACPs in cancer prevention can present a viable substitute for conventional cancer therapeutics, as they possess a higher degree of selectivity and safety. Recent scientific advancements generate an interest in peptide-based therapies which offer the advantage of efficiently treating intended cells without negatively impacting normal cells. However, as the number of peptide sequences continues to increase rapidly, developing a reliable and precise prediction model becomes a challenging task. In this work, our motivation is to advance an efficient model for categorizing anticancer peptides employing the consolidation of word embedding and deep learning models. First, Word2Vec and FastText are evaluated as word embedding techniques for the purpose of extracting peptide sequences. Then, the output of word embedding models are fed into deep learning approaches CNN, LSTM, BiLSTM. To demonstrate the contribution of proposed framework, extensive experiments are carried on widely-used datasets in the literature, ACPs250 and Independent. Experiment results show the usage of proposed model enhances classification accuracy when compared to the state-of-the-art studies. The proposed combination, FastText+BiLSTM, exhibits 92.50% of accuracy for ACPs250 dataset, and 96.15% of accuracy for Independent dataset, thence determining new state-of-the-art.
    Ensuring Topological Data-Structure Preservation under Autoencoder Compression due to Latent Space Regularization in Gauss--Legendre nodes. (arXiv:2309.08228v2 [cs.LG] UPDATED)
    We formulate a data independent latent space regularisation constraint for general unsupervised autoencoders. The regularisation rests on sampling the autoencoder Jacobian in Legendre nodes, being the centre of the Gauss-Legendre quadrature. Revisiting this classic enables to prove that regularised autoencoders ensure a one-to-one re-embedding of the initial data manifold to its latent representation. Demonstrations show that prior proposed regularisation strategies, such as contractive autoencoding, cause topological defects already for simple examples, and so do convolutional based (variational) autoencoders. In contrast, topological preservation is ensured already by standard multilayer perceptron neural networks when being regularised due to our contribution. This observation extends through the classic FashionMNIST dataset up to real world encoding problems for MRI brain scans, suggesting that, across disciplines, reliable low dimensional representations of complex high-dimensional datasets can be delivered due to this regularisation technique.  ( 2 min )
    A Variational Auto-Encoder Enabled Multi-Band Channel Prediction Scheme for Indoor Localization. (arXiv:2309.12200v1 [eess.SP])
    Indoor localization is getting increasing demands for various cutting-edged technologies, like Virtual/Augmented reality and smart home. Traditional model-based localization suffers from significant computational overhead, so fingerprint localization is getting increasing attention, which needs lower computation cost after the fingerprint database is built. However, the accuracy of indoor localization is limited by the complicated indoor environment which brings the multipath signal refraction. In this paper, we provided a scheme to improve the accuracy of indoor fingerprint localization from the frequency domain by predicting the channel state information (CSI) values from another transmitting channel and spliced the multi-band information together to get more precise localization results. We tested our proposed scheme on COST 2100 simulation data and real time orthogonal frequency division multiplexing (OFDM) WiFi data collected from an office scenario.  ( 2 min )
    Learning to Drive Anywhere. (arXiv:2309.12295v1 [cs.CV])
    Human drivers can seamlessly adapt their driving decisions across geographical locations with diverse conditions and rules of the road, e.g., left vs. right-hand traffic. In contrast, existing models for autonomous driving have been thus far only deployed within restricted operational domains, i.e., without accounting for varying driving behaviors across locations or model scalability. In this work, we propose AnyD, a single geographically-aware conditional imitation learning (CIL) model that can efficiently learn from heterogeneous and globally distributed data with dynamic environmental, traffic, and social characteristics. Our key insight is to introduce a high-capacity geo-location-based channel attention mechanism that effectively adapts to local nuances while also flexibly modeling similarities among regions in a data-driven manner. By optimizing a contrastive imitation objective, our proposed approach can efficiently scale across inherently imbalanced data distributions and location-dependent events. We demonstrate the benefits of our AnyD agent across multiple datasets, cities, and scalable deployment paradigms, i.e., centralized, semi-supervised, and distributed agent training. Specifically, AnyD outperforms CIL baselines by over 14% in open-loop evaluation and 30% in closed-loop testing on CARLA.  ( 2 min )
    Bayesian sparsification for deep neural networks with Bayesian model reduction. (arXiv:2309.12095v1 [stat.ML])
    Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
    Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States. (arXiv:2210.00997v3 [stat.ML] UPDATED)
    Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function $h$, and possibly non-Lipschitz. We analyze the regret of online mirror descent with $h$. Then, based on the result, we prove the following in a unified manner. Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3} )$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4} d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is $\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration time is shorter than all existing algorithms we know.  ( 3 min )
    FedGKD: Unleashing the Power of Collaboration in Federated Graph Neural Networks. (arXiv:2309.09517v3 [cs.LG] UPDATED)
    Federated training of Graph Neural Networks (GNN) has become popular in recent years due to its ability to perform graph-related tasks under data isolation scenarios while preserving data privacy. However, graph heterogeneity issues in federated GNN systems continue to pose challenges. Existing frameworks address the problem by representing local tasks using different statistics and relating them through a simple aggregation mechanism. However, these approaches suffer from limited efficiency from two aspects: low quality of task-relatedness quantification and inefficacy of exploiting the collaboration structure. To address these issues, we propose FedGKD, a novel federated GNN framework that utilizes a novel client-side graph dataset distillation method to extract task features that better describe task-relatedness, and introduces a novel server-side aggregation mechanism that is aware of the global collaboration structure. We conduct extensive experiments on six real-world datasets of different scales, demonstrating our framework's outperformance.
    Enhancing SAEAs with Unevaluated Solutions: A Case Study of Relation Model for Expensive Optimization. (arXiv:2309.11994v1 [cs.NE])
    Surrogate-assisted evolutionary algorithms (SAEAs) hold significant importance in resolving expensive optimization problems~(EOPs). Extensive efforts have been devoted to improving the efficacy of SAEAs through the development of proficient model-assisted selection methods. However, generating high-quality solutions is a prerequisite for selection. The fundamental paradigm of evaluating a limited number of solutions in each generation within SAEAs reduces the variance of adjacent populations, thus impacting the quality of offspring solutions. This is a frequently encountered issue, yet it has not gained widespread attention. This paper presents a framework using unevaluated solutions to enhance the efficiency of SAEAs. The surrogate model is employed to identify high-quality solutions for direct generation of new solutions without evaluation. To ensure dependable selection, we have introduced two tailored relation models for the selection of the optimal solution and the unevaluated population. A comprehensive experimental analysis is performed on two test suites, which showcases the superiority of the relation model over regression and classification models in the selection phase. Furthermore, the surrogate-selected unevaluated solutions with high potential have been shown to significantly enhance the efficiency of the algorithm.
    SQUARE: Automatic Question Answering Evaluation using Multiple Positive and Negative References. (arXiv:2309.12250v1 [cs.CL])
    Evaluation of QA systems is very challenging and expensive, with the most reliable approach being human annotations of correctness of answers for questions. Recent works (AVA, BEM) have shown that transformer LM encoder based similarity metrics transfer well for QA evaluation, but they are limited by the usage of a single correct reference answer. We propose a new evaluation metric: SQuArE (Sentence-level QUestion AnsweRing Evaluation), using multiple reference answers (combining multiple correct and incorrect references) for sentence-form QA. We evaluate SQuArE on both sentence-level extractive (Answer Selection) and generative (GenQA) QA systems, across multiple academic and industrial datasets, and show that it outperforms previous baselines and obtains the highest correlation with human annotations.
    ZeroFlow: Fast, Zero Label, Scalable Scene Flow via Distillation. (arXiv:2305.10424v5 [cs.CV] UPDATED)
    Scene flow estimation is the task of describing the 3D motion field between temporally successive point clouds. State-of-the-art methods use strong priors and test-time optimization techniques, but require on the order of tens of seconds to process large-scale point clouds, making them unusable as computer vision primitives for real-time applications such as open world object detection. Feed forward methods are considerably faster, running on the order of tens to hundreds of milliseconds for large-scale point clouds, but require expensive human supervision. To address both limitations, we propose Scene Flow via Distillation, a simple, scalable distillation framework that uses a label-free optimization method to produce pseudo-labels to supervise a feed forward model. Our instantiation of this framework, ZeroFlow, achieves state-of-the-art performance on the Argoverse 2 Self-Supervised Scene Flow Challenge while using zero human labels by simply training on large-scale, diverse unlabeled data. At test-time, ZeroFlow is over 1000$\times$ faster than label-free state-of-the-art optimization-based methods on large-scale point clouds and over 1000$\times$ cheaper to train on unlabeled data compared to the cost of human annotation of that data. To facilitate further research, we will release our code, trained model weights, and high quality pseudo-labels for the Argoverse 2 and Waymo Open datasets.
    Fast Adaptation with Bradley-Terry Preference Models in Text-To-Image Classification and Generation. (arXiv:2308.07929v2 [cs.CV] UPDATED)
    Recently, large multimodal models, such as CLIP and Stable Diffusion have experimented tremendous successes in both foundations and applications. However, as these models increase in parameter size and computational requirements, it becomes more challenging for users to personalize them for specific tasks or preferences. In this work, we address the problem of adapting the previous models towards sets of particular human preferences, aligning the retrieved or generated images with the preferences of the user. We leverage the Bradley-Terry preference model to develop a fast adaptation method that efficiently fine-tunes the original model, with few examples and with minimal computing resources. Extensive evidence of the capabilities of this framework is provided through experiments in different domains related to multimodal text and image understanding, including preference prediction as a reward model, and generation tasks.
    See to Touch: Learning Tactile Dexterity through Visual Incentives. (arXiv:2309.12300v1 [cs.RO])
    Equipping multi-fingered robots with tactile sensing is crucial for achieving the precise, contact-rich, and dexterous manipulation that humans excel at. However, relying solely on tactile sensing fails to provide adequate cues for reasoning about objects' spatial configurations, limiting the ability to correct errors and adapt to changing situations. In this paper, we present Tactile Adaptation from Visual Incentives (TAVI), a new framework that enhances tactile-based dexterity by optimizing dexterous policies using vision-based rewards. First, we use a contrastive-based objective to learn visual representations. Next, we construct a reward function using these visual representations through optimal-transport based matching on one human demonstration. Finally, we use online reinforcement learning on our robot to optimize tactile-based policies that maximize the visual reward. On six challenging tasks, such as peg pick-and-place, unstacking bowls, and flipping slender objects, TAVI achieves a success rate of 73% using our four-fingered Allegro robot hand. The increase in performance is 108% higher than policies using tactile and vision-based rewards and 135% higher than policies without tactile observational input. Robot videos are best viewed on our project website: https://see-to-touch.github.io/.
    Performance Conditioning for Diffusion-Based Multi-Instrument Music Synthesis. (arXiv:2309.12283v1 [cs.SD])
    Generating multi-instrument music from symbolic music representations is an important task in Music Information Retrieval (MIR). A central but still largely unsolved problem in this context is musically and acoustically informed control in the generation process. As the main contribution of this work, we propose enhancing control of multi-instrument synthesis by conditioning a generative model on a specific performance and recording environment, thus allowing for better guidance of timbre and style. Building on state-of-the-art diffusion-based music generative models, we introduce performance conditioning - a simple tool indicating the generative model to synthesize music with style and timbre of specific instruments taken from specific performances. Our prototype is evaluated using uncurated performances with diverse instrumentation and achieves state-of-the-art FAD realism scores while allowing novel timbre and style control. Our project page, including samples and demonstrations, is available at benadar293.github.io/midipm
    Subsampling Suffices for Adaptive Data Analysis. (arXiv:2302.08661v2 [cs.LG] UPDATED)
    Ensuring that analyses performed on a dataset are representative of the entire population is one of the central problems in statistics. Most classical techniques assume that the dataset is independent of the analyst's query and break down in the common setting where a dataset is reused for multiple, adaptively chosen, queries. This problem of \emph{adaptive data analysis} was formalized in the seminal works of Dwork et al. (STOC, 2015) and Hardt and Ullman (FOCS, 2014). We identify a remarkably simple set of assumptions under which the queries will continue to be representative even when chosen adaptively: The only requirements are that each query takes as input a random subsample and outputs few bits. This result shows that the noise inherent in subsampling is sufficient to guarantee that query responses generalize. The simplicity of this subsampling-based framework allows it to model a variety of real-world scenarios not covered by prior work. In addition to its simplicity, we demonstrate the utility of this framework by designing mechanisms for two foundational tasks, statistical queries and median finding. In particular, our mechanism for answering the broadly applicable class of statistical queries is both extremely simple and state of the art in many parameter regimes.
    GLM Regression with Oblivious Corruptions. (arXiv:2309.11657v1 [cs.DS])
    We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$, and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which $\xi + \epsilon = 0$ is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated $g(w^* \cdot x) + A$ for some real number $A$, while also having large error when compared to $g(w^* \cdot x)$. This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.
    S-GBDT: Frugal Differentially Private Gradient Boosting Decision Trees. (arXiv:2309.12041v1 [cs.CR])
    Privacy-preserving learning of gradient boosting decision trees (GBDT) has the potential for strong utility-privacy tradeoffs for tabular data, such as census data or medical meta data: classical GBDT learners can extract non-linear patterns from small sized datasets. The state-of-the-art notion for provable privacy-properties is differential privacy, which requires that the impact of single data points is limited and deniable. We introduce a novel differentially private GBDT learner and utilize four main techniques to improve the utility-privacy tradeoff. (1) We use an improved noise scaling approach with tighter accounting of privacy leakage of a decision tree leaf compared to prior work, resulting in noise that in expectation scales with $O(1/n)$, for $n$ data points. (2) We integrate individual R\'enyi filters to our method to learn from data points that have been underutilized during an iterative training process, which -- potentially of independent interest -- results in a natural yet effective insight to learning streams of non-i.i.d. data. (3) We incorporate the concept of random decision tree splits to concentrate privacy budget on learning leaves. (4) We deploy subsampling for privacy amplification. Our evaluation shows for the Abalone dataset ($<4k$ training data points) a $R^2$-score of $0.39$ for $\varepsilon=0.15$, which the closest prior work only achieved for $\varepsilon=10.0$. On the Adult dataset ($50k$ training data points) we achieve test error of $18.7\,\%$ for $\varepsilon=0.07$ which the closest prior work only achieved for $\varepsilon=1.0$. For the Abalone dataset for $\varepsilon=0.54$ we achieve $R^2$-score of $0.47$ which is very close to the $R^2$-score of $0.54$ for the nonprivate version of GBDT. For the Adult dataset for $\varepsilon=0.54$ we achieve test error $17.1\,\%$ which is very close to the test error $13.7\,\%$ of the nonprivate version of GBDT.
    Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening. (arXiv:2309.11687v1 [cs.LG])
    Virtual screening of large compound libraries to identify potential hit candidates is one of the earliest steps in drug discovery. As the size of commercially available compound collections grows exponentially to the scale of billions, brute-force virtual screening using traditional tools such as docking becomes infeasible in terms of time and computational resources. Active learning and Bayesian optimization has recently been proven as effective methods of narrowing down the search space. An essential component in those methods is a surrogate machine learning model that is trained with a small subset of the library to predict the desired properties of compounds. Accurate model can achieve high sample efficiency by finding the most promising compounds with only a fraction of the whole library being virtually screened. In this study, we examined the performance of pretrained transformer-based language model and graph neural network in Bayesian optimization active learning framework. The best pretrained models identifies 58.97% of the top-50000 by docking score after screening only 0.6% of an ultra-large library containing 99.5 million compounds, improving 8% over previous state-of-the-art baseline. Through extensive benchmarks, we show that the superior performance of pretrained models persists in both structure-based and ligand-based drug discovery. Such model can serve as a boost to the accuracy and sample efficiency of active learning based molecule virtual screening.
    Dictionary Attack on IMU-based Gait Authentication. (arXiv:2309.11766v1 [cs.CR])
    We present a novel adversarial model for authentication systems that use gait patterns recorded by the inertial measurement unit (IMU) built into smartphones. The attack idea is inspired by and named after the concept of a dictionary attack on knowledge (PIN or password) based authentication systems. In particular, this work investigates whether it is possible to build a dictionary of IMUGait patterns and use it to launch an attack or find an imitator who can actively reproduce IMUGait patterns that match the target's IMUGait pattern. Nine physically and demographically diverse individuals walked at various levels of four predefined controllable and adaptable gait factors (speed, step length, step width, and thigh-lift), producing 178 unique IMUGait patterns. Each pattern attacked a wide variety of user authentication models. The deeper analysis of error rates (before and after the attack) challenges the belief that authentication systems based on IMUGait patterns are the most difficult to spoof; further research is needed on adversarial models and associated countermeasures.
    Distilling Adversarial Prompts from Safety Benchmarks: Report for the Adversarial Nibbler Challenge. (arXiv:2309.11575v1 [cs.CV])
    Text-conditioned image generation models have recently achieved astonishing image quality and alignment results. Consequently, they are employed in a fast-growing number of applications. Since they are highly data-driven, relying on billion-sized datasets randomly scraped from the web, they also produce unsafe content. As a contribution to the Adversarial Nibbler challenge, we distill a large set of over 1,000 potential adversarial inputs from existing safety benchmarks. Our analysis of the gathered prompts and corresponding images demonstrates the fragility of input filters and provides further insights into systematic safety issues in current generative image models.
    Differential Evolution Algorithm based Hyper-Parameters Selection of Transformer Neural Network Model for Load Forecasting. (arXiv:2307.15299v3 [cs.NE] UPDATED)
    Accurate load forecasting plays a vital role in numerous sectors, but accurately capturing the complex dynamics of dynamic power systems remains a challenge for traditional statistical models. For these reasons, time-series models (ARIMA) and deep-learning models (ANN, LSTM, GRU, etc.) are commonly deployed and often experience higher success. In this paper, we analyze the efficacy of the recently developed Transformer-based Neural Network model in Load forecasting. Transformer models have the potential to improve Load forecasting because of their ability to learn long-range dependencies derived from their Attention Mechanism. We apply several metaheuristics namely Differential Evolution to find the optimal hyperparameters of the Transformer-based Neural Network to produce accurate forecasts. Differential Evolution provides scalable, robust, global solutions to non-differentiable, multi-objective, or constrained optimization problems. Our work compares the proposed Transformer based Neural Network model integrated with different metaheuristic algorithms by their performance in Load forecasting based on numerical metrics such as Mean Squared Error (MSE) and Mean Absolute Percentage Error (MAPE). Our findings demonstrate the potential of metaheuristic-enhanced Transformer-based Neural Network models in Load forecasting accuracy and provide optimal hyperparameters for each model.
    Meta OOD Learning for Continuously Adaptive OOD Detection. (arXiv:2309.11705v1 [cs.LG])
    Out-of-distribution (OOD) detection is crucial to modern deep learning applications by identifying and alerting about the OOD samples that should not be tested or used for making predictions. Current OOD detection methods have made significant progress when in-distribution (ID) and OOD samples are drawn from static distributions. However, this can be unrealistic when applied to real-world systems which often undergo continuous variations and shifts in ID and OOD distributions over time. Therefore, for an effective application in real-world systems, the development of OOD detection methods that can adapt to these dynamic and evolving distributions is essential. In this paper, we propose a novel and more realistic setting called continuously adaptive out-of-distribution (CAOOD) detection which targets on developing an OOD detection model that enables dynamic and quick adaptation to a new arriving distribution, with insufficient ID samples during deployment time. To address CAOOD, we develop meta OOD learning (MOL) by designing a learning-to-adapt diagram such that a good initialized OOD detection model is learned during the training process. In the testing process, MOL ensures OOD detection performance over shifting distributions by quickly adapting to new distributions with a few adaptations. Extensive experiments on several OOD benchmarks endorse the effectiveness of our method in preserving both ID classification accuracy and OOD detection performance on continuously shifting distributions.
    The Reversal Curse: LLMs trained on "A is B" fail to learn "B is A". (arXiv:2309.12288v1 [cs.CL])
    We expose a surprising failure of generalization in auto-regressive large language models (LLMs). If a model is trained on a sentence of the form "A is B", it will not automatically generalize to the reverse direction "B is A". This is the Reversal Curse. For instance, if a model is trained on "Olaf Scholz was the ninth Chancellor of Germany", it will not automatically be able to answer the question, "Who was the ninth Chancellor of Germany?". Moreover, the likelihood of the correct answer ("Olaf Scholz") will not be higher than for a random name. Thus, models exhibit a basic failure of logical deduction and do not generalize a prevalent pattern in their training set (i.e. if "A is B'' occurs, "B is A" is more likely to occur). We provide evidence for the Reversal Curse by finetuning GPT-3 and Llama-1 on fictitious statements such as "Uriah Hawthorne is the composer of 'Abyssal Melodies'" and showing that they fail to correctly answer "Who composed 'Abyssal Melodies?'". The Reversal Curse is robust across model sizes and model families and is not alleviated by data augmentation. We also evaluate ChatGPT (GPT-3.5 and GPT-4) on questions about real-world celebrities, such as "Who is Tom Cruise's mother? [A: Mary Lee Pfeiffer]" and the reverse "Who is Mary Lee Pfeiffer's son?". GPT-4 correctly answers questions like the former 79% of the time, compared to 33% for the latter. This shows a failure of logical deduction that we hypothesize is caused by the Reversal Curse. Code is available at https://github.com/lukasberglund/reversal_curse.
    Optimal Propagation for Graph Neural Networks. (arXiv:2205.02998v2 [cs.LG] UPDATED)
    Graph Neural Networks (GNNs) have achieved tremendous success in a variety of real-world applications by relying on the fixed graph data as input. However, the initial input graph might not be optimal in terms of specific downstream tasks, because of information scarcity, noise, adversarial attacks, or discrepancies between the distribution in graph topology, features, and groundtruth labels. In this paper, we propose a bi-level optimization approach for learning the optimal graph structure via directly learning the Personalized PageRank propagation matrix as well as the downstream semi-supervised node classification simultaneously. We also explore a low-rank approximation model for further reducing the time complexity. Empirical evaluations show the superior efficacy and robustness of the proposed model over all baseline methods.
    Weakly-supervised Automated Audio Captioning via text only training. (arXiv:2309.12242v1 [cs.SD])
    In recent years, datasets of paired audio and captions have enabled remarkable success in automatically generating descriptions for audio clips, namely Automated Audio Captioning (AAC). However, it is labor-intensive and time-consuming to collect a sufficient number of paired audio and captions. Motivated by the recent advances in Contrastive Language-Audio Pretraining (CLAP), we propose a weakly-supervised approach to train an AAC model assuming only text data and a pre-trained CLAP model, alleviating the need for paired target data. Our approach leverages the similarity between audio and text embeddings in CLAP. During training, we learn to reconstruct the text from the CLAP text embedding, and during inference, we decode using the audio embeddings. To mitigate the modality gap between the audio and text embeddings we employ strategies to bridge the gap during training and inference stages. We evaluate our proposed method on Clotho and AudioCaps datasets demonstrating its ability to achieve a relative performance of up to ~$83\%$ compared to fully supervised approaches trained with paired target data.
    Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets. (arXiv:2309.12032v1 [cs.LG])
    Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.
    Clustering-based Domain-Incremental Learning. (arXiv:2309.12078v1 [cs.LG])
    We consider the problem of learning multiple tasks in a continual learning setting in which data from different tasks is presented to the learner in a streaming fashion. A key challenge in this setting is the so-called "catastrophic forgetting problem", in which the performance of the learner in an "old task" decreases when subsequently trained on a "new task". Existing continual learning methods, such as Averaged Gradient Episodic Memory (A-GEM) and Orthogonal Gradient Descent (OGD), address catastrophic forgetting by minimizing the loss for the current task without increasing the loss for previous tasks. However, these methods assume the learner knows when the task changes, which is unrealistic in practice. In this paper, we alleviate the need to provide the algorithm with information about task changes by using an online clustering-based approach on a dynamically updated finite pool of samples or gradients. We thereby successfully counteract catastrophic forgetting in one of the hardest settings, namely: domain-incremental learning, a setting for which the problem was previously unsolved. We showcase the benefits of our approach by applying these ideas to projection-based methods, such as A-GEM and OGD, which lead to task-agnostic versions of them. Experiments on real datasets demonstrate the effectiveness of the proposed strategy and its promising performance compared to state-of-the-art methods.
    Traffic Forecasting on New Roads Using Spatial Contrastive Pre-Training (SCPT). (arXiv:2305.05237v4 [cs.LG] UPDATED)
    New roads are being constructed all the time. However, the capabilities of previous deep forecasting models to generalize to new roads not seen in the training data (unseen roads) are rarely explored. In this paper, we introduce a novel setup called a spatio-temporal (ST) split to evaluate the models' capabilities to generalize to unseen roads. In this setup, the models are trained on data from a sample of roads, but tested on roads not seen in the training data. Moreover, we also present a novel framework called Spatial Contrastive Pre-Training (SCPT) where we introduce a spatial encoder module to extract latent features from unseen roads during inference time. This spatial encoder is pre-trained using contrastive learning. During inference, the spatial encoder only requires two days of traffic data on the new roads and does not require any re-training. We also show that the output from the spatial encoder can be used effectively to infer latent node embeddings on unseen roads during inference time. The SCPT framework also incorporates a new layer, named the spatially gated addition (SGA) layer, to effectively combine the latent features from the output of the spatial encoder to existing backbones. Additionally, since there is limited data on the unseen roads, we argue that it is better to decouple traffic signals to trivial-to-capture periodic signals and difficult-to-capture Markovian signals, and for the spatial encoder to only learn the Markovian signals. Finally, we empirically evaluated SCPT using the ST split setup on four real-world datasets. The results showed that adding SCPT to a backbone consistently improves forecasting performance on unseen roads. More importantly, the improvements are greater when forecasting further into the future. The codes are available on GitHub: https://github.com/cruiseresearchgroup/forecasting-on-new-roads .
    A Dynamic Domain Adaptation Deep Learning Network for EEG-based Motor Imagery Classification. (arXiv:2309.11714v1 [eess.SP])
    There is a correlation between adjacent channels of electroencephalogram (EEG), and how to represent this correlation is an issue that is currently being explored. In addition, due to inter-individual differences in EEG signals, this discrepancy results in new subjects need spend a amount of calibration time for EEG-based motor imagery brain-computer interface. In order to solve the above problems, we propose a Dynamic Domain Adaptation Based Deep Learning Network (DADL-Net). First, the EEG data is mapped to the three-dimensional geometric space and its temporal-spatial features are learned through the 3D convolution module, and then the spatial-channel attention mechanism is used to strengthen the features, and the final convolution module can further learn the spatial-temporal information of the features. Finally, to account for inter-subject and cross-sessions differences, we employ a dynamic domain-adaptive strategy, the distance between features is reduced by introducing a Maximum Mean Discrepancy loss function, and the classification layer is fine-tuned by using part of the target domain data. We verify the performance of the proposed method on BCI competition IV 2a and OpenBMI datasets. Under the intra-subject experiment, the accuracy rates of 70.42% and 73.91% were achieved on the OpenBMI and BCIC IV 2a datasets.
    Towards Robust and Truly Large-Scale Audio-Sheet Music Retrieval. (arXiv:2309.12158v1 [cs.SD])
    A range of applications of multi-modal music information retrieval is centred around the problem of connecting large collections of sheet music (images) to corresponding audio recordings, that is, identifying pairs of audio and score excerpts that refer to the same musical content. One of the typical and most recent approaches to this task employs cross-modal deep learning architectures to learn joint embedding spaces that link the two distinct modalities - audio and sheet music images. While there has been steady improvement on this front over the past years, a number of open problems still prevent large-scale employment of this methodology. In this article we attempt to provide an insightful examination of the current developments on audio-sheet music retrieval via deep learning methods. We first identify a set of main challenges on the road towards robust and large-scale cross-modal music retrieval in real scenarios. We then highlight the steps we have taken so far to address some of these challenges, documenting step-by-step improvement along several dimensions. We conclude by analysing the remaining challenges and present ideas for solving these, in order to pave the way to a unified and robust methodology for cross-modal music retrieval.
    Electroencephalogram Sensor Data Compression Using An Asymmetrical Sparse Autoencoder With A Discrete Cosine Transform Layer. (arXiv:2309.12201v1 [eess.SP])
    Electroencephalogram (EEG) data compression is necessary for wireless recording applications to reduce the amount of data that needs to be transmitted. In this paper, an asymmetrical sparse autoencoder with a discrete cosine transform (DCT) layer is proposed to compress EEG signals. The encoder module of the autoencoder has a combination of a fully connected linear layer and the DCT layer to reduce redundant data using hard-thresholding nonlinearity. Furthermore, the DCT layer includes trainable hard-thresholding parameters and scaling layers to give emphasis or de-emphasis on individual DCT coefficients. Finally, the one-by-one convolutional layer generates the latent space. The sparsity penalty-based cost function is employed to keep the feature map as sparse as possible in the latent space. The latent space data is transmitted to the receiver. The decoder module of the autoencoder is designed using the inverse DCT and two fully connected linear layers to improve the accuracy of data reconstruction. In comparison to other state-of-the-art methods, the proposed method significantly improves the average quality score in various data compression experiments.
    Fairness Vs. Personalization: Towards Equity in Epistemic Utility. (arXiv:2309.11503v1 [cs.IR])
    The applications of personalized recommender systems are rapidly expanding: encompassing social media, online shopping, search engine results, and more. These systems offer a more efficient way to navigate the vast array of items available. However, alongside this growth, there has been increased recognition of the potential for algorithmic systems to exhibit and perpetuate biases, risking unfairness in personalized domains. In this work, we explicate the inherent tension between personalization and conventional implementations of fairness. As an alternative, we propose equity to achieve fairness in the context of epistemic utility. We provide a mapping between goals and practical implementations and detail policy recommendations across key stakeholders to forge a path towards achieving fairness in personalized systems.
    Global universal approximation of functional input maps on weighted spaces. (arXiv:2306.03303v2 [stat.ML] UPDATED)
    We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family as hidden layer maps and a non-linear activation function applied to each hidden layer. Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global universal approximation result for generalizations of continuous functions going beyond the usual approximation on compact sets. This then applies in particular to approximation of (non-anticipative) path space functionals via functional input neural networks. As a further application of the weighted Stone-Weierstrass theorem we prove a global universal approximation result for linear functions of the signature. We also introduce the viewpoint of Gaussian process regression in this setting and show that the reproducing kernel Hilbert space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes. This paves the way towards uncertainty quantification for signature kernel regression.
    Incentivized Communication for Federated Bandits. (arXiv:2309.11702v1 [cs.LG])
    Most existing works on federated bandits take it for granted that all clients are altruistic about sharing their data with the server for the collective good whenever needed. Despite their compelling theoretical guarantee on performance and communication efficiency, this assumption is overly idealistic and oftentimes violated in practice, especially when the algorithm is operated over self-interested clients, who are reluctant to share data without explicit benefits. Negligence of such self-interested behaviors can significantly affect the learning efficiency and even the practical operability of federated bandit learning. In light of this, we aim to spark new insights into this under-explored research area by formally introducing an incentivized communication problem for federated bandits, where the server shall motivate clients to share data by providing incentives. Without loss of generality, we instantiate this bandit problem with the contextual linear setting and propose the first incentivized communication protocol, namely, Inc-FedUCB, that achieves near-optimal regret with provable communication and incentive cost guarantees. Extensive empirical experiments on both synthetic and real-world datasets further validate the effectiveness of the proposed method across various environments.
    Efficient Core-selecting Incentive Mechanism for Data Sharing in Federated Learning. (arXiv:2309.11722v1 [cs.GT])
    Federated learning is a distributed machine learning system that uses participants' data to train an improved global model. In federated learning, participants cooperatively train a global model, and they will receive the global model and payments. Rational participants try to maximize their individual utility, and they will not input their high-quality data truthfully unless they are provided with satisfactory payments based on their data quality. Furthermore, federated learning benefits from the cooperative contributions of participants. Accordingly, how to establish an incentive mechanism that both incentivizes inputting data truthfully and promotes stable cooperation has become an important issue to consider. In this paper, we introduce a data sharing game model for federated learning and employ game-theoretic approaches to design a core-selecting incentive mechanism by utilizing a popular concept in cooperative games, the core. In federated learning, the core can be empty, resulting in the core-selecting mechanism becoming infeasible. To address this, our core-selecting mechanism employs a relaxation method and simultaneously minimizes the benefits of inputting false data for all participants. However, this mechanism is computationally expensive because it requires aggregating exponential models for all possible coalitions, which is infeasible in federated learning. To address this, we propose an efficient core-selecting mechanism based on sampling approximation that only aggregates models on sampled coalitions to approximate the exact result. Extensive experiments verify that the efficient core-selecting mechanism can incentivize inputting high-quality data and stable cooperation, while it reduces computational overhead compared to the core-selecting mechanism.
    A Comprehensive Review of Community Detection in Graphs. (arXiv:2309.11798v1 [cs.SI])
    The study of complex networks has significantly advanced our understanding of community structures which serves as a crucial feature of real-world graphs. Detecting communities in graphs is a challenging problem with applications in sociology, biology, and computer science. Despite the efforts of an interdisciplinary community of scientists, a satisfactory solution to this problem has not yet been achieved. This review article delves into the topic of community detection in graphs, which serves as a crucial role in understanding the organization and functioning of complex systems. We begin by introducing the concept of community structure, which refers to the arrangement of vertices into clusters, with strong internal connections and weaker connections between clusters. Then, we provide a thorough exposition of various community detection methods, including a new method designed by us. Additionally, we explore real-world applications of community detection in diverse networks. In conclusion, this comprehensive review provides a deep understanding of community detection in graphs. It serves as a valuable resource for researchers and practitioners in multiple disciplines, offering insights into the challenges, methodologies, and applications of community detection in complex networks.
    ConDA: Contrastive Domain Adaptation for AI-generated Text Detection. (arXiv:2309.03992v2 [cs.CL] UPDATED)
    Large language models (LLMs) are increasingly being used for generating text in a variety of use cases, including journalistic news articles. Given the potential malicious nature in which these LLMs can be used to generate disinformation at scale, it is important to build effective detectors for such AI-generated text. Given the surge in development of new LLMs, acquiring labeled training data for supervised detectors is a bottleneck. However, there might be plenty of unlabeled text data available, without information on which generator it came from. In this work we tackle this data problem, in detecting AI-generated news text, and frame the problem as an unsupervised domain adaptation task. Here the domains are the different text generators, i.e. LLMs, and we assume we have access to only the labeled source data and unlabeled target data. We develop a Contrastive Domain Adaptation framework, called ConDA, that blends standard domain adaptation techniques with the representation power of contrastive learning to learn domain invariant representations that are effective for the final unsupervised detection task. Our experiments demonstrate the effectiveness of our framework, resulting in average performance gains of 31.7% from the best performing baselines, and within 0.8% margin of a fully supervised detector. All our code and data is available at https://github.com/AmritaBh/ConDA-gen-text-detection.
    DimCL: Dimensional Contrastive Learning For Improving Self-Supervised Learning. (arXiv:2309.11782v1 [cs.CV])
    Self-supervised learning (SSL) has gained remarkable success, for which contrastive learning (CL) plays a key role. However, the recent development of new non-CL frameworks has achieved comparable or better performance with high improvement potential, prompting researchers to enhance these frameworks further. Assimilating CL into non-CL frameworks has been thought to be beneficial, but empirical evidence indicates no visible improvements. In view of that, this paper proposes a strategy of performing CL along the dimensional direction instead of along the batch direction as done in conventional contrastive learning, named Dimensional Contrastive Learning (DimCL). DimCL aims to enhance the feature diversity, and it can serve as a regularizer to prior SSL frameworks. DimCL has been found to be effective, and the hardness-aware property is identified as a critical reason for its success. Extensive experimental results reveal that assimilating DimCL into SSL frameworks leads to performance improvement by a non-trivial margin on various datasets and backbone architectures.
    Privacy-Preserving In-Context Learning with Differentially Private Few-Shot Generation. (arXiv:2309.11765v1 [cs.LG])
    We study the problem of in-context learning (ICL) with large language models (LLMs) on private datasets. This scenario poses privacy risks, as LLMs may leak or regurgitate the private examples demonstrated in the prompt. We propose a novel algorithm that generates synthetic few-shot demonstrations from the private dataset with formal differential privacy (DP) guarantees, and show empirically that it can achieve effective ICL. We conduct extensive experiments on standard benchmarks and compare our algorithm with non-private ICL and zero-shot solutions. Our results demonstrate that our algorithm can achieve competitive performance with strong privacy levels. These results open up new possibilities for ICL with privacy protection for a broad range of applications.  ( 2 min )
    Quantum Conformal Prediction for Reliable Uncertainty Quantification in Quantum Machine Learning. (arXiv:2304.03398v2 [quant-ph] UPDATED)
    Quantum machine learning is a promising programming paradigm for the optimization of quantum algorithms in the current era of noisy intermediate scale quantum (NISQ) computers. A fundamental challenge in quantum machine learning is generalization, as the designer targets performance under testing conditions, while having access only to limited training data. Existing generalization analyses, while identifying important general trends and scaling laws, cannot be used to assign reliable and informative "error bars" to the decisions made by quantum models. In this article, we propose a general methodology that can reliably quantify the uncertainty of quantum models, irrespective of the amount of training data, of the number of shots, of the ansatz, of the training algorithm, and of the presence of quantum hardware noise. The approach, which builds on probabilistic conformal prediction, turns an arbitrary, possibly small, number of shots from a pre-trained quantum model into a set prediction, e.g., an interval, that provably contains the true target with any desired coverage level. Experimental results confirm the theoretical calibration guarantees of the proposed framework, referred to as quantum conformal prediction.
    Hierarchical reinforcement learning with natural language subgoals. (arXiv:2309.11564v1 [cs.LG])
    Hierarchical reinforcement learning has been a compelling approach for achieving goal directed behavior over long sequences of actions. However, it has been challenging to implement in realistic or open-ended environments. A main challenge has been to find the right space of sub-goals over which to instantiate a hierarchy. We present a novel approach where we use data from humans solving these tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment. In particular, we use unconstrained natural language to parameterize this space. This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks. Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space. Our work presents a novel approach to combining human expert supervision with the benefits and flexibility of reinforcement learning.
    Regionally Additive Models: Explainable-by-design models minimizing feature interactions. (arXiv:2309.12215v1 [cs.LG])
    Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption fails in ML problems where the output depends on multiple features simultaneously. In these cases, GAMs fail to capture the interaction terms of the underlying function, leading to subpar accuracy. To (partially) address this issue, we propose Regionally Additive Models (RAMs), a novel class of explainable-by-design models. RAMs identify subregions within the feature space where interactions are minimized. Within these regions, it is more accurate to express the output as a sum of univariate functions (components). Consequently, RAMs fit one component per subregion of each feature instead of one component per feature. This approach yields a more expressive model compared to GAMs while retaining interpretability. The RAM framework consists of three steps. Firstly, we train a black-box model. Secondly, using Regional Effect Plots, we identify subregions where the black-box model exhibits near-local additivity. Lastly, we fit a GAM component for each identified subregion. We validate the effectiveness of RAMs through experiments on both synthetic and real-world datasets. The results confirm that RAMs offer improved expressiveness compared to GAMs while maintaining interpretability.
    Prodigy: An Expeditiously Adaptive Parameter-Free Learner. (arXiv:2306.06101v2 [cs.LG] UPDATED)
    We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.
    Multi-agent Deep Covering Skill Discovery. (arXiv:2210.03269v3 [cs.LG] UPDATED)
    The use of skills (a.k.a., options) can greatly accelerate exploration in reinforcement learning, especially when only sparse reward signals are available. While option discovery methods have been proposed for individual agents, in multi-agent reinforcement learning settings, discovering collaborative options that can coordinate the behavior of multiple agents and encourage them to visit the under-explored regions of their joint state space has not been considered. In this case, we propose Multi-agent Deep Covering Option Discovery, which constructs the multi-agent options through minimizing the expected cover time of the multiple agents' joint state space. Also, we propose a novel framework to adopt the multi-agent options in the MARL process. In practice, a multi-agent task can usually be divided into some sub-tasks, each of which can be completed by a sub-group of the agents. Therefore, our algorithm framework first leverages an attention mechanism to find collaborative agent sub-groups that would benefit most from coordinated actions. Then, a hierarchical algorithm, namely HA-MSAC, is developed to learn the multi-agent options for each sub-group to complete their sub-tasks first, and then to integrate them through a high-level policy as the solution of the whole task. This hierarchical option construction allows our framework to strike a balance between scalability and effective collaboration among the agents. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can effectively capture the agent interactions with the attention mechanism, successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher task rewards.
    Multiclass Learnability Does Not Imply Sample Compression. (arXiv:2308.06424v2 [cs.LG] UPDATED)
    A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.
    Text2Reward: Automated Dense Reward Function Generation for Reinforcement Learning. (arXiv:2309.11489v2 [cs.LG] UPDATED)
    Designing reward functions is a longstanding challenge in reinforcement learning (RL); it requires specialized knowledge or domain data, leading to high costs for development. To address this, we introduce Text2Reward, a data-free framework that automates the generation of dense reward functions based on large language models (LLMs). Given a goal described in natural language, Text2Reward generates dense reward functions as an executable program grounded in a compact representation of the environment. Unlike inverse RL and recent work that uses LLMs to write sparse reward codes, Text2Reward produces interpretable, free-form dense reward codes that cover a wide range of tasks, utilize existing packages, and allow iterative refinement with human feedback. We evaluate Text2Reward on two robotic manipulation benchmarks (ManiSkill2, MetaWorld) and two locomotion environments of MuJoCo. On 13 of the 17 manipulation tasks, policies trained with generated reward codes achieve similar or better task success rates and convergence speed than expert-written reward codes. For locomotion tasks, our method learns six novel locomotion behaviors with a success rate exceeding 94%. Furthermore, we show that the policies trained in the simulator with our method can be deployed in the real world. Finally, Text2Reward further improves the policies by refining their reward functions with human feedback. Video results are available at https://text-to-reward.github.io
    A Constructive Approach to Function Realization by Neural Stochastic Differential Equations. (arXiv:2307.00215v2 [math.OC] UPDATED)
    The problem of function approximation by neural dynamical systems has typically been approached in a top-down manner: Any continuous function can be approximated to an arbitrary accuracy by a sufficiently complex model with a given architecture. This can lead to high-complexity controls which are impractical in applications. In this paper, we take the opposite, constructive approach: We impose various structural restrictions on system dynamics and consequently characterize the class of functions that can be realized by such a system. The systems are implemented as a cascade interconnection of a neural stochastic differential equation (Neural SDE), a deterministic dynamical system, and a readout map. Both probabilistic and geometric (Lie-theoretic) methods are used to characterize the classes of functions realized by such systems.
    A Machine Learning-oriented Survey on Tiny Machine Learning. (arXiv:2309.11932v1 [cs.LG])
    The emergence of Tiny Machine Learning (TinyML) has positively revolutionized the field of Artificial Intelligence by promoting the joint design of resource-constrained IoT hardware devices and their learning-based software architectures. TinyML carries an essential role within the fourth and fifth industrial revolutions in helping societies, economies, and individuals employ effective AI-infused computing technologies (e.g., smart cities, automotive, and medical robotics). Given its multidisciplinary nature, the field of TinyML has been approached from many different angles: this comprehensive survey wishes to provide an up-to-date overview focused on all the learning algorithms within TinyML-based solutions. The survey is based on the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) methodological flow, allowing for a systematic and complete literature survey. In particular, firstly we will examine the three different workflows for implementing a TinyML-based system, i.e., ML-oriented, HW-oriented, and co-design. Secondly, we propose a taxonomy that covers the learning panorama under the TinyML lens, examining in detail the different families of model optimization and design, as well as the state-of-the-art learning techniques. Thirdly, this survey will present the distinct features of hardware devices and software tools that represent the current state-of-the-art for TinyML intelligent edge applications. Finally, we discuss the challenges and future directions.
    On the Probability of Immunity. (arXiv:2309.11942v1 [stat.ME])
    This work is devoted to the study of the probability of immunity, i.e. the effect occurs whether exposed or not. We derive necessary and sufficient conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the probability of immunity is zero and $\epsilon$-bounded, respectively. The former allows us to estimate the probability of benefit (i.e., the effect occurs if and only if exposed) from a randomized controlled trial, and the latter allows us to produce bounds of the probability of benefit that are tighter than the existing ones. We also introduce the concept of indirect immunity (i.e., through a mediator) and repeat our previous analysis for it. Finally, we propose a method for sensitivity analysis of the probability of immunity under unmeasured confounding.
    Quasi-Monte Carlo for 3D Sliced Wasserstein. (arXiv:2309.11713v1 [stat.ML])
    Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.  ( 2 min )
    EPTQ: Enhanced Post-Training Quantization via Label-Free Hessian. (arXiv:2309.11531v1 [cs.CV])
    Quantization of deep neural networks (DNN) has become a key element in the efforts of embedding such networks on end-user devices. However, current quantization methods usually suffer from costly accuracy degradation. In this paper, we propose a new method for Enhanced Post Training Quantization named EPTQ. The method is based on knowledge distillation with an adaptive weighting of layers. In addition, we introduce a new label-free technique for approximating the Hessian trace of the task loss, named Label-Free Hessian. This technique removes the requirement of a labeled dataset for computing the Hessian. The adaptive knowledge distillation uses the Label-Free Hessian technique to give greater attention to the sensitive parts of the model while performing the optimization. Empirically, by employing EPTQ we achieve state-of-the-art results on a wide variety of models, tasks, and datasets, including ImageNet classification, COCO object detection, and Pascal-VOC for semantic segmentation. We demonstrate the performance and compatibility of EPTQ on an extended set of architectures, including CNNs, Transformers, hybrid, and MLP-only models.  ( 2 min )
    Bayesian Flow Networks. (arXiv:2308.07037v2 [cs.LG] UPDATED)
    This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
    Unveiling Optimal SDG Pathways: An Innovative Approach Leveraging Graph Pruning and Intent Graph for Effective Recommendations. (arXiv:2309.11741v1 [cs.IR])
    The recommendation of appropriate development pathways, also known as ecological civilization patterns for achieving Sustainable Development Goals (namely, sustainable development patterns), are of utmost importance for promoting ecological, economic, social, and resource sustainability in a specific region. To achieve this, the recommendation process must carefully consider the region's natural, environmental, resource, and economic characteristics. However, current recommendation algorithms in the field of computer science fall short in adequately addressing the spatial heterogeneity related to environment and sparsity of regional historical interaction data, which limits their effectiveness in recommending sustainable development patterns. To overcome these challenges, this paper proposes a method called User Graph after Pruning and Intent Graph (UGPIG). Firstly, we utilize the high-density linking capability of the pruned User Graph to address the issue of spatial heterogeneity neglect in recommendation algorithms. Secondly, we construct an Intent Graph by incorporating the intent network, which captures the preferences for attributes including environmental elements of target regions. This approach effectively alleviates the problem of sparse historical interaction data in the region. Through extensive experiments, we demonstrate that UGPIG outperforms state-of-the-art recommendation algorithms like KGCN, KGAT, and KGIN in sustainable development pattern recommendations, with a maximum improvement of 9.61% in Top-3 recommendation performance.  ( 2 min )
    CATS: Conditional Adversarial Trajectory Synthesis for Privacy-Preserving Trajectory Data Publication Using Deep Learning Approaches. (arXiv:2309.11587v1 [cs.LG])
    The prevalence of ubiquitous location-aware devices and mobile Internet enables us to collect massive individual-level trajectory dataset from users. Such trajectory big data bring new opportunities to human mobility research but also raise public concerns with regard to location privacy. In this work, we present the Conditional Adversarial Trajectory Synthesis (CATS), a deep-learning-based GeoAI methodological framework for privacy-preserving trajectory data generation and publication. CATS applies K-anonymity to the underlying spatiotemporal distributions of human movements, which provides a distributional-level strong privacy guarantee. By leveraging conditional adversarial training on K-anonymized human mobility matrices, trajectory global context learning using the attention-based mechanism, and recurrent bipartite graph matching of adjacent trajectory points, CATS is able to reconstruct trajectory topology from conditionally sampled locations and generate high-quality individual-level synthetic trajectory data, which can serve as supplements or alternatives to raw data for privacy-preserving trajectory data publication. The experiment results on over 90k GPS trajectories show that our method has a better performance in privacy preservation, spatiotemporal characteristic preservation, and downstream utility compared with baseline methods, which brings new insights into privacy-preserving human mobility research using generative AI techniques and explores data ethics issues in GIScience.
    Drift Control of High-Dimensional RBM: A Computational Method Based on Neural Networks. (arXiv:2309.11651v1 [eess.SY])
    Motivated by applications in queueing theory, we consider a stochastic control problem whose state space is the $d$-dimensional positive orthant. The controlled process $Z$ evolves as a reflected Brownian motion whose covariance matrix is exogenously specified, as are its directions of reflection from the orthant's boundary surfaces. A system manager chooses a drift vector $\theta(t)$ at each time $t$ based on the history of $Z$, and the cost rate at time $t$ depends on both $Z(t)$ and $\theta(t)$. In our initial problem formulation, the objective is to minimize expected discounted cost over an infinite planning horizon, after which we treat the corresponding ergodic control problem. Extending earlier work by Han et al. (Proceedings of the National Academy of Sciences, 2018, 8505-8510), we develop and illustrate a simulation-based computational method that relies heavily on deep neural network technology. For test problems studied thus far, our method is accurate to within a fraction of one percent, and is computationally feasible in dimensions up to at least $d=30$.
    Private Matrix Factorization with Public Item Features. (arXiv:2309.11516v1 [cs.IR])
    We consider the problem of training private recommendation models with access to public item features. Training with Differential Privacy (DP) offers strong privacy guarantees, at the expense of loss in recommendation quality. We show that incorporating public item features during training can help mitigate this loss in quality. We propose a general approach based on collective matrix factorization (CMF), that works by simultaneously factorizing two matrices: the user feedback matrix (representing sensitive data) and an item feature matrix that encodes publicly available (non-sensitive) item information. The method is conceptually simple, easy to tune, and highly scalable. It can be applied to different types of public item data, including: (1) categorical item features; (2) item-item similarities learned from public sources; and (3) publicly available user feedback. Furthermore, these data modalities can be collectively utilized to fully leverage public data. Evaluating our method on a standard DP recommendation benchmark, we find that using public item features significantly narrows the quality gap between private models and their non-private counterparts. As privacy constraints become more stringent, models rely more heavily on public side features for recommendation. This results in a smooth transition from collaborative filtering to item-based contextual recommendations.
    Instruction Tuning for Large Language Models: A Survey. (arXiv:2308.10792v2 [cs.CL] UPDATED)
    This paper surveys research works in the quickly advancing field of instruction tuning (IT), a crucial technique to enhance the capabilities and controllability of large language models (LLMs). Instruction tuning refers to the process of further training LLMs on a dataset consisting of \textsc{(instruction, output)} pairs in a supervised fashion, which bridges the gap between the next-word prediction objective of LLMs and the users' objective of having LLMs adhere to human instructions. In this work, we make a systematic review of the literature, including the general methodology of IT, the construction of IT datasets, the training of IT models, and applications to different modalities, domains and applications, along with an analysis on aspects that influence the outcome of IT (e.g., generation of instruction outputs, size of the instruction dataset, etc). We also review the potential pitfalls of IT along with criticism against it, along with efforts pointing out current deficiencies of existing strategies and suggest some avenues for fruitful research.
    Likelihood-based Sensor Calibration for Expert-Supported Distributed Learning Algorithms in IoT Systems. (arXiv:2309.11526v1 [cs.LG])
    An important task in the field of sensor technology is the efficient implementation of adaptation procedures of measurements from one sensor to another sensor of identical design. One idea is to use the estimation of an affine transformation between different systems, which can be improved by the knowledge of experts. This paper presents an improved solution from Glacier Research that was published back in 1973. It is shown that this solution can be adapted for software calibration of sensors, implementation of expert-based adaptation, and federated learning methods. We evaluate our research with simulations and also with real measured data of a multi-sensor board with 8 identical sensors. The results show an improvement for both the simulation and the experiments with real data.
    Leveraging Negative Signals with Self-Attention for Sequential Music Recommendation. (arXiv:2309.11623v1 [cs.IR])
    Music streaming services heavily rely on their recommendation engines to continuously provide content to their consumers. Sequential recommendation consequently has seen considerable attention in current literature, where state of the art approaches focus on self-attentive models leveraging contextual information such as long and short-term user history and item features; however, most of these studies focus on long-form content domains (retail, movie, etc.) rather than short-form, such as music. Additionally, many do not explore incorporating negative session-level feedback during training. In this study, we investigate the use of transformer-based self-attentive architectures to learn implicit session-level information for sequential music recommendation. We additionally propose a contrastive learning task to incorporate negative feedback (e.g skipped tracks) to promote positive hits and penalize negative hits. This task is formulated as a simple loss term that can be incorporated into a variety of deep learning architectures for sequential recommendation. Our experiments show that this results in consistent performance gains over the baseline architectures ignoring negative user feedback.
    A survey on the semantics of sequential patterns with negation. (arXiv:2309.11638v1 [cs.LG])
    A sequential pattern with negation, or negative sequential pattern, takes the form of a sequential pattern for which the negation symbol may be used in front of some of the pattern's itemsets. Intuitively, such a pattern occurs in a sequence if negated itemsets are absent in the sequence. Recent work has shown that different semantics can be attributed to these pattern forms, and that state-of-the-art algorithms do not extract the same sets of patterns. This raises the important question of the interpretability of sequential pattern with negation. In this study, our focus is on exploring how potential users perceive negation in sequential patterns. Our aim is to determine whether specific semantics are more "intuitive" than others and whether these align with the semantics employed by one or more state-of-the-art algorithms. To achieve this, we designed a questionnaire to reveal the semantics' intuition of each user. This article presents both the design of the questionnaire and an in-depth analysis of the 124 responses obtained. The outcomes indicate that two of the semantics are predominantly intuitive; however, neither of them aligns with the semantics of the primary state-of-the-art algorithms. As a result, we provide recommendations to account for this disparity in the conclusions drawn.
    The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains. (arXiv:2309.12279v1 [cs.LG])
    Initialization of neural network weights plays a pivotal role in determining their performance. Feature Imitating Networks (FINs) offer a novel strategy by initializing weights to approximate specific closed-form statistical features, setting a promising foundation for deep learning architectures. While the applicability of FINs has been chiefly tested in biomedical domains, this study extends its exploration into other time series datasets. Three different experiments are conducted in this study to test the applicability of imitating Tsallis entropy for performance enhancement: Bitcoin price prediction, speech emotion recognition, and chronic neck pain detection. For the Bitcoin price prediction, models embedded with FINs reduced the root mean square error by around 1000 compared to the baseline. In the speech emotion recognition task, the FIN-augmented model increased classification accuracy by over 3 percent. Lastly, in the CNP detection experiment, an improvement of about 7 percent was observed compared to established classifiers. These findings validate the broad utility and potency of FINs in diverse applications.  ( 2 min )
    Turaco: Complexity-Guided Data Sampling for Training Neural Surrogates of Programs. (arXiv:2309.11726v1 [cs.PL])
    Programmers and researchers are increasingly developing surrogates of programs, models of a subset of the observable behavior of a given program, to solve a variety of software development challenges. Programmers train surrogates from measurements of the behavior of a program on a dataset of input examples. A key challenge of surrogate construction is determining what training data to use to train a surrogate of a given program. We present a methodology for sampling datasets to train neural-network-based surrogates of programs. We first characterize the proportion of data to sample from each region of a program's input space (corresponding to different execution paths of the program) based on the complexity of learning a surrogate of the corresponding execution path. We next provide a program analysis to determine the complexity of different paths in a program. We evaluate these results on a range of real-world programs, demonstrating that complexity-guided sampling results in empirical improvements in accuracy.  ( 2 min )
    How Robust is Google's Bard to Adversarial Image Attacks?. (arXiv:2309.11751v1 [cs.CV])
    Multimodal Large Language Models (MLLMs) that integrate text and other modalities (especially vision) have achieved unprecedented performance in various multimodal tasks. However, due to the unsolved adversarial robustness problem of vision models, MLLMs can have more severe safety and security risks by introducing the vision inputs. In this work, we study the adversarial robustness of Google's Bard, a competitive chatbot to ChatGPT that released its multimodal capability recently, to better understand the vulnerabilities of commercial MLLMs. By attacking white-box surrogate vision encoders or MLLMs, the generated adversarial examples can mislead Bard to output wrong image descriptions with a 22% success rate based solely on the transferability. We show that the adversarial examples can also attack other MLLMs, e.g., a 26% attack success rate against Bing Chat and a 86% attack success rate against ERNIE bot. Moreover, we identify two defense mechanisms of Bard, including face detection and toxicity detection of images. We design corresponding attacks to evade these defenses, demonstrating that the current defenses of Bard are also vulnerable. We hope this work can deepen our understanding on the robustness of MLLMs and facilitate future research on defenses. Our code is available at https://github.com/thu-ml/Attack-Bard.  ( 2 min )
    A Study of Forward-Forward Algorithm for Self-Supervised Learning. (arXiv:2309.11955v1 [cs.CV])
    Self-supervised representation learning has seen remarkable progress in the last few years, with some of the recent methods being able to learn useful image representations without labels. These methods are trained using backpropagation, the de facto standard. Recently, Geoffrey Hinton proposed the forward-forward algorithm as an alternative training method. It utilizes two forward passes and a separate loss function for each layer to train the network without backpropagation. In this study, for the first time, we study the performance of forward-forward vs. backpropagation for self-supervised representation learning and provide insights into the learned representation spaces. Our benchmark employs four standard datasets, namely MNIST, F-MNIST, SVHN and CIFAR-10, and three commonly used self-supervised representation learning techniques, namely rotation, flip and jigsaw. Our main finding is that while the forward-forward algorithm performs comparably to backpropagation during (self-)supervised training, the transfer performance is significantly lagging behind in all the studied settings. This may be caused by a combination of factors, including having a loss function for each layer and the way the supervised training is realized in the forward-forward paradigm. In comparison to backpropagation, the forward-forward algorithm focuses more on the boundaries and drops part of the information unnecessary for making decisions which harms the representation learning goal. Further investigation and research are necessary to stabilize the forward-forward strategy for self-supervised learning, to work beyond the datasets and configurations demonstrated by Geoffrey Hinton.  ( 2 min )
    Using causal inference to avoid fallouts in data-driven parametric analysis: a case study in the architecture, engineering, and construction industry. (arXiv:2309.11509v1 [cs.CE])
    The decision-making process in real-world implementations has been affected by a growing reliance on data-driven models. We investigated the synergetic pattern between the data-driven methods, empirical domain knowledge, and first-principles simulations. We showed the potential risk of biased results when using data-driven models without causal analysis. Using a case study assessing the implication of several design solutions on the energy consumption of a building, we proved the necessity of causal analysis during the data-driven modeling process. We concluded that: (a) Data-driven models' accuracy assessment or domain knowledge screening may not rule out biased and spurious results; (b) Data-driven models' feature selection should involve careful consideration of causal relationships, especially colliders; (c) Causal analysis results can be used as an aid to first-principles simulation design and parameter checking to avoid cognitive biases. We proved the benefits of causal analysis when applied to data-driven models in building engineering.  ( 2 min )
    Passage Summarization with Recurrent Models for Audio-Sheet Music Retrieval. (arXiv:2309.12111v1 [cs.SD])
    Many applications of cross-modal music retrieval are related to connecting sheet music images to audio recordings. A typical and recent approach to this is to learn, via deep neural networks, a joint embedding space that correlates short fixed-size snippets of audio and sheet music by means of an appropriate similarity structure. However, two challenges that arise out of this strategy are the requirement of strongly aligned data to train the networks, and the inherent discrepancies of musical content between audio and sheet music snippets caused by local and global tempo differences. In this paper, we address these two shortcomings by designing a cross-modal recurrent network that learns joint embeddings that can summarize longer passages of corresponding audio and sheet music. The benefits of our method are that it only requires weakly aligned audio-sheet music pairs, as well as that the recurrent network handles the non-linearities caused by tempo variations between audio and sheet music. We conduct a number of experiments on synthetic and real piano data and scores, showing that our proposed recurrent method leads to more accurate retrieval in all possible configurations.  ( 2 min )
    TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback. (arXiv:2309.11527v1 [cs.IR])
    This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest.  ( 2 min )
    Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization. (arXiv:2309.11856v1 [stat.ML])
    Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.  ( 2 min )
    Orbital AI-based Autonomous Refuelling Solution. (arXiv:2309.11648v1 [cs.CV])
    Cameras are rapidly becoming the choice for on-board sensors towards space rendezvous due to their small form factor and inexpensive power, mass, and volume costs. When it comes to docking, however, they typically serve a secondary role, whereas the main work is done by active sensors such as lidar. This paper documents the development of a proposed AI-based (artificial intelligence) navigation algorithm intending to mature the use of on-board visible wavelength cameras as a main sensor for docking and on-orbit servicing (OOS), reducing the dependency on lidar and greatly reducing costs. Specifically, the use of AI enables the expansion of the relative navigation solution towards multiple classes of scenarios, e.g., in terms of targets or illumination conditions, which would otherwise have to be crafted on a case-by-case manner using classical image processing methods. Multiple convolutional neural network (CNN) backbone architectures are benchmarked on synthetically generated data of docking manoeuvres with the International Space Station (ISS), achieving position and attitude estimates close to 1% range-normalised and 1 deg, respectively. The integration of the solution with a physical prototype of the refuelling mechanism is validated in laboratory using a robotic arm to simulate a berthing procedure.  ( 2 min )
    Popularity Degradation Bias in Local Music Recommendation. (arXiv:2309.11671v1 [cs.IR])
    In this paper, we study the effect of popularity degradation bias in the context of local music recommendations. Specifically, we examine how accurate two top-performing recommendation algorithms, Weight Relevance Matrix Factorization (WRMF) and Multinomial Variational Autoencoder (Mult-VAE), are at recommending artists as a function of artist popularity. We find that both algorithms improve recommendation performance for more popular artists and, as such, exhibit popularity degradation bias. While both algorithms produce a similar level of performance for more popular artists, Mult-VAE shows better relative performance for less popular artists. This suggests that this algorithm should be preferred for local (long-tail) music artist recommendation.  ( 2 min )
    Multidimensional well-being of US households at a fine spatial scale using fused household surveys: fusionACS. (arXiv:2309.11512v1 [stat.AP])
    Social science often relies on surveys of households and individuals. Dozens of such surveys are regularly administered by the U.S. government. However, they field independent, unconnected samples with specialized questions, limiting research questions to those that can be answered by a single survey. The fusionACS project seeks to integrate data from multiple U.S. household surveys by statistically "fusing" variables from "donor" surveys onto American Community Survey (ACS) microdata. This results in an integrated microdataset of household attributes and well-being dimensions that can be analyzed to address research questions in ways that are not currently possible. The presented data comprise the fusion onto the ACS of select donor variables from the Residential Energy Consumption Survey (RECS) of 2015, the National Household Transportation Survey (NHTS) of 2017, the American Housing Survey (AHS) of 2019, and the Consumer Expenditure Survey - Interview (CEI) for the years 2015-2019. The underlying statistical techniques are included in an open-source $R$ package, fusionModel, that provides generic tools for the creation, analysis, and validation of fused microdata.  ( 2 min )
    Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework. (arXiv:2309.11682v1 [cs.LG])
    While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.  ( 2 min )
    Ad-load Balancing via Off-policy Learning in a Content Marketplace. (arXiv:2309.11518v1 [cs.IR])
    Ad-load balancing is a critical challenge in online advertising systems, particularly in the context of social media platforms, where the goal is to maximize user engagement and revenue while maintaining a satisfactory user experience. This requires the optimization of conflicting objectives, such as user satisfaction and ads revenue. Traditional approaches to ad-load balancing rely on static allocation policies, which fail to adapt to changing user preferences and contextual factors. In this paper, we present an approach that leverages off-policy learning and evaluation from logged bandit feedback. We start by presenting a motivating analysis of the ad-load balancing problem, highlighting the conflicting objectives between user satisfaction and ads revenue. We emphasize the nuances that arise due to user heterogeneity and the dependence on the user's position within a session. Based on this analysis, we define the problem as determining the optimal ad-load for a particular feed fetch. To tackle this problem, we propose an off-policy learning framework that leverages unbiased estimators such as Inverse Propensity Scoring (IPS) and Doubly Robust (DR) to learn and estimate the policy values using offline collected stochastic data. We present insights from online A/B experiments deployed at scale across over 80 million users generating over 200 million sessions, where we find statistically significant improvements in both user satisfaction metrics and ads revenue for the platform.  ( 2 min )
    Towards Differential Privacy in Sequential Recommendation: A Noisy Graph Neural Network Approach. (arXiv:2309.11515v1 [cs.CR])
    With increasing frequency of high-profile privacy breaches in various online platforms, users are becoming more concerned about their privacy. And recommender system is the core component of online platforms for providing personalized service, consequently, its privacy preservation has attracted great attention. As the gold standard of privacy protection, differential privacy has been widely adopted to preserve privacy in recommender systems. However, existing differentially private recommender systems only consider static and independent interactions, so they cannot apply to sequential recommendation where behaviors are dynamic and dependent. Meanwhile, little attention has been paid on the privacy risk of sensitive user features, most of them only protect user feedbacks. In this work, we propose a novel DIfferentially Private Sequential recommendation framework with a noisy Graph Neural Network approach (denoted as DIPSGNN) to address these limitations. To the best of our knowledge, we are the first to achieve differential privacy in sequential recommendation with dependent interactions. Specifically, in DIPSGNN, we first leverage piecewise mechanism to protect sensitive user features. Then, we innovatively add calibrated noise into aggregation step of graph neural network based on aggregation perturbation mechanism. And this noisy graph neural network can protect sequentially dependent interactions and capture user preferences simultaneously. Extensive experiments demonstrate the superiority of our method over state-of-the-art differentially private recommender systems in terms of better balance between privacy and accuracy.  ( 2 min )
    Latent Diffusion Models for Structural Component Design. (arXiv:2309.11601v1 [cs.LG])
    Recent advances in generative modeling, namely Diffusion models, have revolutionized generative modeling, enabling high-quality image generation tailored to user needs. This paper proposes a framework for the generative design of structural components. Specifically, we employ a Latent Diffusion model to generate potential designs of a component that can satisfy a set of problem-specific loading conditions. One of the distinct advantages our approach offers over other generative approaches, such as generative adversarial networks (GANs), is that it permits the editing of existing designs. We train our model using a dataset of geometries obtained from structural topology optimization utilizing the SIMP algorithm. Consequently, our framework generates inherently near-optimal designs. Our work presents quantitative results that support the structural performance of the generated designs and the variability in potential candidate designs. Furthermore, we provide evidence of the scalability of our framework by operating over voxel domains with resolutions varying from $32^3$ to $128^3$. Our framework can be used as a starting point for generating novel near-optimal designs similar to topology-optimized designs.  ( 2 min )
  • Open

    Activation Compression of Graph Neural Networks using Block-wise Quantization with Improved Variance Minimization. (arXiv:2309.11856v1 [stat.ML])
    Efficient training of large-scale graph neural networks (GNNs) has been studied with a specific focus on reducing their memory consumption. Work by Liu et al. (2022) proposed extreme activation compression (EXACT) which demonstrated drastic reduction in memory consumption by performing quantization of the intermediate activation maps down to using INT2 precision. They showed little to no reduction in performance while achieving large reductions in GPU memory consumption. In this work, we present an improvement to the EXACT strategy by using block-wise quantization of the intermediate activation maps. We experimentally analyze different block sizes and show further reduction in memory consumption (>15%), and runtime speedup per epoch (about 5%) even when performing extreme extents of quantization with similar performance trade-offs as with the original EXACT. Further, we present a correction to the assumptions on the distribution of intermediate activation maps in EXACT (assumed to be uniform) and show improved variance estimations of the quantization and dequantization steps.  ( 2 min )
    On the Probability of Immunity. (arXiv:2309.11942v1 [stat.ME])
    This work is devoted to the study of the probability of immunity, i.e. the effect occurs whether exposed or not. We derive necessary and sufficient conditions for non-immunity and $\epsilon$-bounded immunity, i.e. the probability of immunity is zero and $\epsilon$-bounded, respectively. The former allows us to estimate the probability of benefit (i.e., the effect occurs if and only if exposed) from a randomized controlled trial, and the latter allows us to produce bounds of the probability of benefit that are tighter than the existing ones. We also introduce the concept of indirect immunity (i.e., through a mediator) and repeat our previous analysis for it. Finally, we propose a method for sensitivity analysis of the probability of immunity under unmeasured confounding.
    Grassmann Manifold Flows for Stable Shape Generation. (arXiv:2211.02900v2 [cs.LG] UPDATED)
    Recently, studies on machine learning have focused on methods that use symmetry implicit in a specific manifold as an inductive bias. Grassmann manifolds provide the ability to handle fundamental shapes represented as shape spaces, enabling stable shape analysis. In this paper, we present a novel approach in which we establish the theoretical foundations for learning distributions on the Grassmann manifold via continuous normalization flows, with the explicit goal of generating stable shapes. Our approach facilitates more robust generation by effectively eliminating the influence of extraneous transformations, such as rotations and inversions, through learning and generating within a Grassmann manifolds designed to accommodate the essential shape information of the object. The experimental results indicated that the proposed method can generate high-quality samples by capturing the data structure. Furthermore, the proposed method significantly outperformed state-of-the-art methods in terms of the log-likelihood or evidence lower bound. The results obtained are expected to stimulate further research in this field, leading to advances for stable shape generation and analysis.
    E-detectors: a nonparametric framework for sequential change detection. (arXiv:2203.03532v3 [stat.ME] UPDATED)
    Sequential change detection is a classical problem with a variety of applications. However, the majority of prior work has been parametric, for example, focusing on exponential families. We develop a fundamentally new and general framework for sequential change detection when the pre- and post-change distributions are nonparametrically specified (and thus composite). Our procedures come with clean, nonasymptotic bounds on the average run length (frequency of false alarms). In certain nonparametric cases (like sub-Gaussian or sub-exponential), we also provide near-optimal bounds on the detection delay following a changepoint. The primary technical tool that we introduce is called an \emph{e-detector}, which is composed of sums of e-processes -- a fundamental generalization of nonnegative supermartingales -- that are started at consecutive times. We first introduce simple Shiryaev-Roberts and CUSUM-style e-detectors, and then show how to design their mixtures in order to achieve both statistical and computational efficiency. Our e-detector framework can be instantiated to recover classical likelihood-based procedures for parametric problems, as well as yielding the first change detection method for many nonparametric problems. As a running example, we tackle the problem of detecting changes in the mean of a bounded random variable without i.i.d. assumptions, with an application to tracking the performance of a basketball team over multiple seasons.
    Model-based Clustering using Non-parametric Hidden Markov Models. (arXiv:2309.12238v1 [math.ST])
    Thanks to their dependency structure, non-parametric Hidden Markov Models (HMMs) are able to handle model-based clustering without specifying group distributions. The aim of this work is to study the Bayes risk of clustering when using HMMs and to propose associated clustering procedures. We first give a result linking the Bayes risk of classification and the Bayes risk of clustering, which we use to identify the key quantity determining the difficulty of the clustering task. We also give a proof of this result in the i.i.d. framework, which might be of independent interest. Then we study the excess risk of the plugin classifier. All these results are shown to remain valid in the online setting where observations are clustered sequentially. Simulations illustrate our findings.
    Multiclass Learnability Does Not Imply Sample Compression. (arXiv:2308.06424v2 [cs.LG] UPDATED)
    A hypothesis class admits a sample compression scheme, if for every sample labeled by a hypothesis from the class, it is possible to retain only a small subsample, using which the labels on the entire sample can be inferred. The size of the compression scheme is an upper bound on the size of the subsample produced. Every learnable binary hypothesis class (which must necessarily have finite VC dimension) admits a sample compression scheme of size only a finite function of its VC dimension, independent of the sample size. For multiclass hypothesis classes, the analog of VC dimension is the DS dimension. We show that the analogous statement pertaining to sample compression is not true for multiclass hypothesis classes: every learnable multiclass hypothesis class, which must necessarily have finite DS dimension, does not admit a sample compression scheme of size only a finite function of its DS dimension.  ( 2 min )
    Statistical Complexity of Quantum Learning. (arXiv:2309.11617v1 [quant-ph])
    Recent years have seen significant activity on the problem of using data for the purpose of learning properties of quantum systems or of processing classical or quantum data via quantum computing. As in classical learning, quantum learning problems involve settings in which the mechanism generating the data is unknown, and the main goal of a learning algorithm is to ensure satisfactory accuracy levels when only given access to data and, possibly, side information such as expert knowledge. This article reviews the complexity of quantum learning using information-theoretic techniques by focusing on data complexity, copy complexity, and model complexity. Copy complexity arises from the destructive nature of quantum measurements, which irreversibly alter the state to be processed, limiting the information that can be extracted about quantum data. For example, in a quantum system, unlike in classical machine learning, it is generally not possible to evaluate the training loss simultaneously on multiple hypotheses using the same quantum data. To make the paper self-contained and approachable by different research communities, we provide extensive background material on classical results from statistical learning theory, as well as on the distinguishability of quantum states. Throughout, we highlight the differences between quantum and classical learning by addressing both supervised and unsupervised learning, and we provide extensive pointers to the literature.
    Online Self-Concordant and Relatively Smooth Minimization, With Applications to Online Portfolio Selection and Learning Quantum States. (arXiv:2210.00997v3 [stat.ML] UPDATED)
    Consider an online convex optimization problem where the loss functions are self-concordant barriers, smooth relative to a convex function $h$, and possibly non-Lipschitz. We analyze the regret of online mirror descent with $h$. Then, based on the result, we prove the following in a unified manner. Denote by $T$ the time horizon and $d$ the parameter dimension. 1. For online portfolio selection, the regret of $\widetilde{\text{EG}}$, a variant of exponentiated gradient due to Helmbold et al., is $\tilde{O} ( T^{2/3} d^{1/3} )$ when $T > 4 d / \log d$. This improves on the original $\tilde{O} ( T^{3/4} d^{1/2} )$ regret bound for $\widetilde{\text{EG}}$. 2. For online portfolio selection, the regret of online mirror descent with the logarithmic barrier is $\tilde{O}(\sqrt{T d})$. The regret bound is the same as that of Soft-Bayes due to Orseau et al. up to logarithmic terms. 3. For online learning quantum states with the logarithmic loss, the regret of online mirror descent with the log-determinant function is also $\tilde{O} ( \sqrt{T d} )$. Its per-iteration time is shorter than all existing algorithms we know.
    Proportional Response: Contextual Bandits for Simple and Cumulative Regret Minimization. (arXiv:2307.02108v2 [cs.LG] UPDATED)
    Simple regret minimization is a critical problem in learning optimal treatment assignment policies across various domains, including healthcare and e-commerce. However, it remains understudied in the contextual bandit setting. We propose a new family of computationally efficient bandit algorithms for the stochastic contextual bandit settings, with the flexibility to be adapted for cumulative regret minimization (with near-optimal minimax guarantees) and simple regret minimization (with SOTA guarantees). Furthermore, our algorithms adapt to model misspecification and extend to the continuous arm settings. These advantages come from constructing and relying on "conformal arm sets" (CASs), which provide a set of arms at every context that encompass the context-specific optimal arm with some probability across the context distribution. Our positive results on simple and cumulative regret guarantees are contrasted by a negative result, which shows that an algorithm can't achieve instance-dependent simple regret guarantees while simultaneously achieving minimax optimal cumulative regret guarantees.  ( 2 min )
    fakenewsbr: A Fake News Detection Platform for Brazilian Portuguese. (arXiv:2309.11052v2 [cs.CL] UPDATED)
    The proliferation of fake news has become a significant concern in recent times due to its potential to spread misinformation and manipulate public opinion. This paper presents a comprehensive study on detecting fake news in Brazilian Portuguese, focusing on journalistic-type news. We propose a machine learning-based approach that leverages natural language processing techniques, including TF-IDF and Word2Vec, to extract features from textual data. We evaluate the performance of various classification algorithms, such as logistic regression, support vector machine, random forest, AdaBoost, and LightGBM, on a dataset containing both true and fake news articles. The proposed approach achieves high accuracy and F1-Score, demonstrating its effectiveness in identifying fake news. Additionally, we developed a user-friendly web platform, fakenewsbr.com, to facilitate the verification of news articles' veracity. Our platform provides real-time analysis, allowing users to assess the likelihood of fake news articles. Through empirical analysis and comparative studies, we demonstrate the potential of our approach to contribute to the fight against the spread of fake news and promote more informed media consumption.
    Human-in-the-Loop Causal Discovery under Latent Confounding using Ancestral GFlowNets. (arXiv:2309.12032v1 [cs.LG])
    Structure learning is the crux of causal inference. Notably, causal discovery (CD) algorithms are brittle when data is scarce, possibly inferring imprecise causal relations that contradict expert knowledge -- especially when considering latent confounders. To aggravate the issue, most CD methods do not provide uncertainty estimates, making it hard for users to interpret results and improve the inference process. Surprisingly, while CD is a human-centered affair, no works have focused on building methods that both 1) output uncertainty estimates that can be verified by experts and 2) interact with those experts to iteratively refine CD. To solve these issues, we start by proposing to sample (causal) ancestral graphs proportionally to a belief distribution based on a score function, such as the Bayesian information criterion (BIC), using generative flow networks. Then, we leverage the diversity in candidate graphs and introduce an optimal experimental design to iteratively probe the expert about the relations among variables, effectively reducing the uncertainty of our belief over ancestral graphs. Finally, we update our samples to incorporate human feedback via importance sampling. Importantly, our method does not require causal sufficiency (i.e., unobserved confounders may exist). Experiments with synthetic observational data show that our method can accurately sample from distributions over ancestral graphs and that we can greatly improve inference quality with human aid.  ( 2 min )
    TrueLearn: A Python Library for Personalised Informational Recommendations with (Implicit) Feedback. (arXiv:2309.11527v1 [cs.IR])
    This work describes the TrueLearn Python library, which contains a family of online learning Bayesian models for building educational (or more generally, informational) recommendation systems. This family of models was designed following the "open learner" concept, using humanly-intuitive user representations. For the sake of interpretability and putting the user in control, the TrueLearn library also contains different representations to help end-users visualise the learner models, which may in the future facilitate user interaction with their own models. Together with the library, we include a previously publicly released implicit feedback educational dataset with evaluation metrics to measure the performance of the models. The extensive documentation and coding examples make the library highly accessible to both machine learning developers and educational data mining and learning analytic practitioners. The library and the support documentation with examples are available at https://truelearn.readthedocs.io/en/latest.
    Bayesian sparsification for deep neural networks with Bayesian model reduction. (arXiv:2309.12095v1 [stat.ML])
    Deep learning's immense capabilities are often constrained by the complexity of its models, leading to an increasing demand for effective sparsification techniques. Bayesian sparsification for deep learning emerges as a crucial approach, facilitating the design of models that are both computationally efficient and competitive in terms of performance across various deep learning applications. The state-of-the-art -- in Bayesian sparsification of deep neural networks -- combines structural shrinkage priors on model weights with an approximate inference scheme based on black-box stochastic variational inference. However, model inversion of the full generative model is exceptionally computationally demanding, especially when compared to standard deep learning of point estimates. In this context, we advocate for the use of Bayesian model reduction (BMR) as a more efficient alternative for pruning of model weights. As a generalization of the Savage-Dickey ratio, BMR allows a post-hoc elimination of redundant model weights based on the posterior estimates under a straightforward (non-hierarchical) generative model. Our comparative study highlights the computational efficiency and the pruning rate of the BMR method relative to the established stochastic variational inference (SVI) scheme, when applied to the full hierarchical generative model. We illustrate the potential of BMR to prune model parameters across various deep learning architectures, from classical networks like LeNet to modern frameworks such as Vision Transformers and MLP-Mixers.
    Dr. FERMI: A Stochastic Distributionally Robust Fair Empirical Risk Minimization Framework. (arXiv:2309.11682v1 [cs.LG])
    While training fair machine learning models has been studied extensively in recent years, most developed methods rely on the assumption that the training and test data have similar distributions. In the presence of distribution shifts, fair models may behave unfairly on test data. There have been some developments for fair learning robust to distribution shifts to address this shortcoming. However, most proposed solutions are based on the assumption of having access to the causal graph describing the interaction of different features. Moreover, existing algorithms require full access to data and cannot be used when small batches are used (stochastic/batch implementation). This paper proposes the first stochastic distributionally robust fairness framework with convergence guarantees that do not require knowledge of the causal graph. More specifically, we formulate the fair inference in the presence of the distribution shift as a distributionally robust optimization problem under $L_p$ norm uncertainty sets with respect to the Exponential Renyi Mutual Information (ERMI) as the measure of fairness violation. We then discuss how the proposed method can be implemented in a stochastic fashion. We have evaluated the presented framework's performance and efficiency through extensive experiments on real datasets consisting of distribution shifts.
    Prodigy: An Expeditiously Adaptive Parameter-Free Learner. (arXiv:2306.06101v2 [cs.LG] UPDATED)
    We consider the problem of estimating the learning rate in adaptive methods, such as Adagrad and Adam. We describe two techniques, Prodigy and Resetting, to provably estimate the distance to the solution $D$, which is needed to set the learning rate optimally. Our techniques are modifications of the D-Adaptation method for learning-rate-free learning. Our methods improve upon the convergence rate of D-Adaptation by a factor of $O(\sqrt{\log(D/d_0)})$, where $d_0$ is the initial estimate of $D$. We test our methods on 12 common logistic-regression benchmark datasets, VGG11 and ResNet-50 training on CIFAR10, ViT training on Imagenet, LSTM training on IWSLT14, DLRM training on Criteo dataset, VarNet on Knee MRI dataset, as well as RoBERTa and GPT transformer training on BookWiki. Our experimental results show that our approaches consistently outperform D-Adaptation and reach test accuracy values close to that of hand-tuned Adam.  ( 2 min )
    Quasi-Monte Carlo for 3D Sliced Wasserstein. (arXiv:2309.11713v1 [stat.ML])
    Monte Carlo (MC) approximation has been used as the standard computation approach for the Sliced Wasserstein (SW) distance, which has an intractable expectation in its analytical form. However, the MC method is not optimal in terms of minimizing the absolute approximation error. To provide a better class of empirical SW, we propose quasi-sliced Wasserstein (QSW) approximations that rely on Quasi-Monte Carlo (QMC) methods. For a comprehensive investigation of QMC for SW, we focus on the 3D setting, specifically computing the SW between probability measures in three dimensions. In greater detail, we empirically verify various ways of constructing QMC points sets on the 3D unit-hypersphere, including Gaussian-based mapping, equal area mapping, generalized spiral points, and optimizing discrepancy energies. Furthermore, to obtain an unbiased estimation for stochastic optimization, we extend QSW into Randomized Quasi-Sliced Wasserstein (RQSW) by introducing randomness to the discussed low-discrepancy sequences. For theoretical properties, we prove the asymptotic convergence of QSW and the unbiasedness of RQSW. Finally, we conduct experiments on various 3D tasks, such as point-cloud comparison, point-cloud interpolation, image style transfer, and training deep point-cloud autoencoders, to demonstrate the favorable performance of the proposed QSW and RQSW variants.  ( 2 min )
    GLM Regression with Oblivious Corruptions. (arXiv:2309.11657v1 [cs.DS])
    We demonstrate the first algorithms for the problem of regression for generalized linear models (GLMs) in the presence of additive oblivious noise. We assume we have sample access to examples $(x, y)$ where $y$ is a noisy measurement of $g(w^* \cdot x)$. In particular, \new{the noisy labels are of the form} $y = g(w^* \cdot x) + \xi + \epsilon$, where $\xi$ is the oblivious noise drawn independently of $x$ \new{and satisfies} $\Pr[\xi = 0] \geq o(1)$, and $\epsilon \sim \mathcal N(0, \sigma^2)$. Our goal is to accurately recover a \new{parameter vector $w$ such that the} function $g(w \cdot x)$ \new{has} arbitrarily small error when compared to the true values $g(w^* \cdot x)$, rather than the noisy measurements $y$. We present an algorithm that tackles \new{this} problem in its most general distribution-independent setting, where the solution may not \new{even} be identifiable. \new{Our} algorithm returns \new{an accurate estimate of} the solution if it is identifiable, and otherwise returns a small list of candidates, one of which is close to the true solution. Furthermore, we \new{provide} a necessary and sufficient condition for identifiability, which holds in broad settings. \new{Specifically,} the problem is identifiable when the quantile at which $\xi + \epsilon = 0$ is known, or when the family of hypotheses does not contain candidates that are nearly equal to a translated $g(w^* \cdot x) + A$ for some real number $A$, while also having large error when compared to $g(w^* \cdot x)$. This is the first \new{algorithmic} result for GLM regression \new{with oblivious noise} which can handle more than half the samples being arbitrarily corrupted. Prior work focused largely on the setting of linear regression, and gave algorithms under restrictive assumptions.  ( 3 min )
    On the different regimes of Stochastic Gradient Descent. (arXiv:2309.10688v2 [cs.LG] UPDATED)
    Modern deep networks are trained with stochastic gradient descent (SGD) whose key parameters are the number of data considered at each step or batch size $B$, and the step size or learning rate $\eta$. For small $B$ and large $\eta$, SGD corresponds to a stochastic evolution of the parameters, whose noise amplitude is governed by the `temperature' $T\equiv \eta/B$. Yet this description is observed to break down for sufficiently large batches $B\geq B^*$, or simplifies to gradient descent (GD) when the temperature is sufficiently small. Understanding where these cross-overs take place remains a central challenge. Here we resolve these questions for a teacher-student perceptron classification model, and show empirically that our key predictions still apply to deep networks. Specifically, we obtain a phase diagram in the $B$-$\eta$ plane that separates three dynamical phases: $\textit{(i)}$ a noise-dominated SGD governed by temperature, $\textit{(ii)}$ a large-first-step-dominated SGD and $\textit{(iii)}$ GD. These different phases also corresponds to different regimes of generalization error. Remarkably, our analysis reveals that the batch size $B^*$ separating regimes $\textit{(i)}$ and $\textit{(ii)}$ scale with the size $P$ of the training set, with an exponent that characterizes the hardness of the classification problem.  ( 2 min )
    Regionally Additive Models: Explainable-by-design models minimizing feature interactions. (arXiv:2309.12215v1 [cs.LG])
    Generalized Additive Models (GAMs) are widely used explainable-by-design models in various applications. GAMs assume that the output can be represented as a sum of univariate functions, referred to as components. However, this assumption fails in ML problems where the output depends on multiple features simultaneously. In these cases, GAMs fail to capture the interaction terms of the underlying function, leading to subpar accuracy. To (partially) address this issue, we propose Regionally Additive Models (RAMs), a novel class of explainable-by-design models. RAMs identify subregions within the feature space where interactions are minimized. Within these regions, it is more accurate to express the output as a sum of univariate functions (components). Consequently, RAMs fit one component per subregion of each feature instead of one component per feature. This approach yields a more expressive model compared to GAMs while retaining interpretability. The RAM framework consists of three steps. Firstly, we train a black-box model. Secondly, using Regional Effect Plots, we identify subregions where the black-box model exhibits near-local additivity. Lastly, we fit a GAM component for each identified subregion. We validate the effectiveness of RAMs through experiments on both synthetic and real-world datasets. The results confirm that RAMs offer improved expressiveness compared to GAMs while maintaining interpretability.  ( 2 min )
    Global universal approximation of functional input maps on weighted spaces. (arXiv:2306.03303v2 [stat.ML] UPDATED)
    We introduce so-called functional input neural networks defined on a possibly infinite dimensional weighted space with values also in a possibly infinite dimensional output space. To this end, we use an additive family as hidden layer maps and a non-linear activation function applied to each hidden layer. Relying on Stone-Weierstrass theorems on weighted spaces, we can prove a global universal approximation result for generalizations of continuous functions going beyond the usual approximation on compact sets. This then applies in particular to approximation of (non-anticipative) path space functionals via functional input neural networks. As a further application of the weighted Stone-Weierstrass theorem we prove a global universal approximation result for linear functions of the signature. We also introduce the viewpoint of Gaussian process regression in this setting and show that the reproducing kernel Hilbert space of the signature kernels are Cameron-Martin spaces of certain Gaussian processes. This paves the way towards uncertainty quantification for signature kernel regression.  ( 2 min )
    The Broad Impact of Feature Imitation: Neural Enhancements Across Financial, Speech, and Physiological Domains. (arXiv:2309.12279v1 [cs.LG])
    Initialization of neural network weights plays a pivotal role in determining their performance. Feature Imitating Networks (FINs) offer a novel strategy by initializing weights to approximate specific closed-form statistical features, setting a promising foundation for deep learning architectures. While the applicability of FINs has been chiefly tested in biomedical domains, this study extends its exploration into other time series datasets. Three different experiments are conducted in this study to test the applicability of imitating Tsallis entropy for performance enhancement: Bitcoin price prediction, speech emotion recognition, and chronic neck pain detection. For the Bitcoin price prediction, models embedded with FINs reduced the root mean square error by around 1000 compared to the baseline. In the speech emotion recognition task, the FIN-augmented model increased classification accuracy by over 3 percent. Lastly, in the CNP detection experiment, an improvement of about 7 percent was observed compared to established classifiers. These findings validate the broad utility and potency of FINs in diverse applications.  ( 2 min )

  • Open

    This is an actual barcode created by AI as a piece of art. Scan it for its secret message if you don't believe me..
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    Issue with voice.ai — no sound comes out after feeding a prerecorded audio clip through the voice filter!
    So I created a voice profile based on recordings of my own voice over on voice.ai. The audio was crisp, loud and clear. I only added about 16 minutes if it though. (A minimum of 15 minutes is recommended) When I take a prerecorded voice clip and feed it through the voice profile I created, it’s just silent for all 15 seconds that it grants you before you upgrade. Then it has the little 2 second audio watermark at the end, where it says “voice AI,” and that I can hear perfectly fine! I’ve searched all over for others who might have had this problem, but it seems like the main problem people have reported difficulties with audio are/were tend to center around the live mode failing to function properly. Has anyone had this issue before? Should I add more audio to the voice profile to make it more well rounded perhaps? If so, how much audio do you typically need to create a solid custom voice profile? Should I just update to the paid version? I don’t really think that would fix it, and I wanted to wait to upgrade until I had some proof that it worked, yanno? I’m not sure what to do… Any ideas? submitted by /u/WhenTheFoxGRINS [link] [comments]  ( 9 min )
    Looking for the best AI Story Generator? Meet NovelGPT.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    AI Startup Buzz Is Facing a Reality Check
    Founders and venture capitalists who flocked to artificial-intelligence startups are learning that turning the chatbot buzz into successful businesses is harder than it seems. Source https://www.wsj.com/tech/ai/ai-startup-buzz-is-facing-a-reality-check-e34babfe submitted by /u/NuseAI [link] [comments]  ( 8 min )
    AI — weekly megathread!
    News provided by aibrews.com Genmo releases a new text-to-video model: Genmo Replay v0.1, which generate high-quality videos from text without the need for advanced prompt engineering. Genmo is available for free to create AI videos [Details | Genmo Replay] . OpenAI unveils DALL·E 3 - a major update to the text-to-image model, which will be integrated in ChatGPT. It will be available to ChatGPT Plus and Enterprise users in October, via the API and in Labs later this fall. Creators can now also opt their images out from future training [Details]. Toyota Research Institute has developed a technique, powered by generative AI, that enables teaching robots new manipulation abilities in a single afternoon. Using the same robot, same code, and same setup, TRI taught over 60 different dextero…  ( 11 min )
    LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine
    As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources. A new paper proposes LongLoRA, a fine-tuning approach that can extend LLaMA2 7B to 100k context length and 70B model to 32k context length on a single 8× A100 machine. Here are my highlights from the paper: Big one of course: LongLoRA efficiently fine-tunes large AI models on longer texts Key points: Approximates standard attention via "shift short attention" during training Tuning only a subset of weights (LoRA) plus some embeddings & norms Fine-tuned 7B parameter model on 100k tokens with 1 machine Way lower training cost than full fine-tuning for large contexts Close to full fine-tuning performance The core insight is that an approximation of full attention enables efficient training while retaining standard attention for final inference. Combined with selective weight tuning, this really reduces compute needs. I think this demonstrates the potential to train more capable AI without unreasonable resources. Efficient training techniques = more powerful LLMs for the same resources. Full summary here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    Information technology industry is at 35% of AI adoption in the US
    It's currently on the 4th place after marketing, consulting, and accounting. And is mostly used in software testing field to: achieve more accurate results have larger test coverage receive low learning curve get faster QA completion I was actually quite surprised when I read, I thought that information technology would at least be in the TOP-3 industries. Do you think it can happen by the end of this year? submitted by /u/unbalanced_mind [link] [comments]  ( 9 min )
    Why ChatGPT isn’t conscious – but future AI systems might be | "Different theories of consciousness suggest some basic properties we might expect a conscious system to have"
    submitted by /u/Tao_Dragon [link] [comments]  ( 8 min )
    want to get GTA 4 navigation voice into Google Maps is it possible using AI?
    hello everyone there was a voice-guided navigation system for GTA 4 with a female voice, I would love to use that as my Google Maps and Waze navigation voice I tried looking for anything online but the mods available are of very inferior quality and from a few years ago which won't work properly with Google Maps as a lot of the voice commands are missing the sample for the voice can be found in these 2 YouTube videos first sample 11 minutes repeated statements 2nd sample 1 minute without repeated statements Using Adobe podcast I can remove a lot of the background noise The problem is I don't know much about sound tuning so using whatever free ai I could find on the internet the voice is not being cloned properly it is sounding mechanical or just like google ttss i have a decen…  ( 10 min )
    DALL-E 3 & ChatGPT: The Game-Changing AI Tool for Text-to-Image Generation
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Free Unlimited Face Swap Tool You Can Use in Browser
    submitted by /u/Vegetable_Tutor8245 [link] [comments]  ( 8 min )
    Claude LLM, draft a strong speech for a world leader predicting that by 2030 our AI revolution could make the world so wonderful that we can safely close our prisons.
    Here is a draft speech focusing on the potential for AI to help create a more just and peaceful world by 2030: My fellow citizens, We stand today on the cusp of a new era. The rapid development of artificial intelligence technology presents us with an unparalleled opportunity to remake our world for the better. Though this technology poses risks, I believe that if guided responsibly, AI can help us build a society of prosperity, equity, and justice. Over the past decade, we have already seen glimpses of AI's vast potential for good. Algorithms are helping doctors diagnose diseases more accurately. Smart systems are optimizing transportation networks and reducing traffic accidents. Machine learning is enabling more efficient energy use and sustainable agriculture. But this is only the…  ( 10 min )
  • Open

    [N] Splash Pro drops generative music model and comparison to other models
    Seems like a strong contender in this space, plus comercial use: https://www.splashmusic.com/music-generation submitted by /u/No-Reference8836 [link] [comments]  ( 8 min )
    [R] BTLM-3B-8K: 7B Parameter Performance in a 3B Parameter Model
    Hello Reddit! I'm Daria from the Cerebras ML team, and I've got some thrilling updates to share with you! 🚀 We've recently published our BTLM-3B-8K paper on arXiv, and I’m excited to share that I am one of the authors! Check it out here: https://arxiv.org/abs/2309.11568 It distills our recipe for training SOTA LLMs: An extensively deduplicated dataset: SlimPajama Hyperparameter search using muP Variable sequence length training + ALiBi Aggressive LR decay Our BTLM-3B-8K not only sets a new standard for 3B parameter models but also frequently surpasses the performance of 7B models! In the paper, we meticulously outline how we developed the BTLM model and conducted a comprehensive analysis of its performance on 22 validation benchmarks. These benchmarks span a range of capabilities including common sense reasoning, world knowledge, reading comprehension, code generation, long sequence interpolation/extrapolation, bias, toxicity, and misinformation. For those eager to dive in, we've made our SlimPajama dataset and the BTLM-3B-8K model available on Hugging Face: https://huggingface.co/cerebras 🎉 Feel free to delve into the details, explore the dataset and model, and let us know your thoughts, insights, or questions! We're here to discuss and excited to hear your feedback. Happy exploring! 🚀 submitted by /u/daria-sobol [link] [comments]  ( 9 min )
    [D]: Is There Any Followup To Effect Of Model Size on LoRA Rank "r"?
    Hello all, I am re-reading the LoRA paper (https://arxiv.org/abs/2106.09685) to get a deeper understanding of some of the analysis the authors perform at the end and saw this line Note that the relationship between model size and the optimal rank for adaptation is still an open question. Does anybody know of any resources out there that looked into this question, given that LoRA has been around for a little bit now? Perhaps someone has performed similar subspace overlap / optimal "r" value studies on some of the LLMs that fall in-between GPT2 and 3, i.e. some of the ~7B, ~15B, ~40B and ~70B models? submitted by /u/lightSpeedBrick [link] [comments]  ( 9 min )
    [D] When do we use the instruct version of a LLM?
    If censorship isn’t an issue for me, when there’s an instruct version of an LLM, is it generally always better to use the instruct version than the non-instruct version (because instruct versions tend to hallucinate less)? Apart from censorship and hallucinations, are there any other pros and cons between intrust vs. non-instruct version? submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [R] How to reduce hallucinations using Chain Of Verification in Large Language Models
    This new paper from Shehzaad Dhuliawala et al. (2023) introduces a combination of prompting and consistency checks made by the LLM itself. Implementing this technique actually made me like gpt-3.5 again ! I wrote a tutorial on how to actually implement this method : https://advanced-stack.com/resources/how-to-reduce-hallucinations-using-chain-of-verification-cov-in-large-language-models.html Let me know if you find it useful ​ submitted by /u/Fluid-Age-9266 [link] [comments]  ( 9 min )
    [D] Looking for suggestions
    Hi guys, a final year CS student here. I want to create a portfolio to showcase my skills in ML and DL. I have knowledge in docker and have access to google cloud platform to deploy. Now, I am unable to find any project that stands out. Could u suggest something that I could learn from as well as looks pretty on my cv? submitted by /u/Virtual_Heron_7417 [link] [comments]  ( 9 min )
    Transformers: I can't fathom the concept of dynamic weights in attention heads [R]
    Hey everyone, I've been diving deep into the world of neural networks, and recently, I've been particularly intrigued by the dynamicity of attention head parameters (weights). These weights play a crucial role in transformers, and understanding how they change during training and inference can provide valuable insights into model behavior. The question is, what does dynamic mean in this context? Is it input-adaptive? Do weights change at inference time according to inputs? I have a hard time understanding this concept, for me, weights are static and pre-established at training time. submitted by /u/assalas23 [link] [comments]  ( 9 min )
    [D] Transformers: I can't fathom the concept of dynamic weights in attention heads
    Hey everyone, I've been diving deep into the world of neural networks, and recently, I've been particularly intrigued by the dynamicity of attention head parameters (weights). These weights play a crucial role in transformers, and understanding how they change during training and inference can provide valuable insights into model behavior. The question is, what does dynamic mean in this context? Is it input-adaptive? Do weights change at inference time according to inputs? I have a hard time understanding this concept, for me, weights are static and pre-established at training time. submitted by /u/assalas23 [link] [comments]  ( 9 min )
    Help Shape the Future of A.I.! Take a Quick 2-minute Survey for Academic Research [R]
    Hello guys! ​ Are you intrigued by the world of Artificial Intelligence? I am conducting a brief and insightful survey on AI for academic research purposes. It'll take just 2 minutes of your time, and your valuable insights will contribute to our understanding of AI trends and perspectives. Whether you're a tech enthusiast, a data wizard, or just curious about the future of AI, your input is incredibly valuable! ​ Click the link below to share your thoughts and help to understand this new technology: ​ https://forms.gle/7fbbkc1f2iBPXHJV7 ​ Thank you in advance for being a part of this exciting AI study! #AIResearch #AI #Artificial Intelligence #SurveyTime submitted by /u/JukeboxNV [link] [comments]  ( 9 min )
    [R] LongLoRA: New method extends LLAMA2 7B to 100k context length, 70B to 32k context length on on a single 8 × A100 machine
    As AI models get bigger, training them requires more and more computing power. Researchers are looking for ways to train these large AI models without needing Google-scale resources. A new paper proposes LongLoRA, a fine-tuning approach that can extend LLaMA2 7B to 100k context length and 70B model to 32k context length on a single 8× A100 machine. Here are my highlights from the paper: Big one of course: LongLoRA efficiently fine-tunes large AI models on longer texts Key points: Approximates standard attention via "shift short attention" during training Tuning only a subset of weights (LoRA) plus some embeddings & norms Fine-tuned 7B parameter model on 100k tokens with 1 machine Way lower training cost than full fine-tuning for large contexts Close to full fine-tuning performance The core insight is that an approximation of full attention enables efficient training while retaining standard attention for final inference. Combined with selective weight tuning, this really reduces compute needs. I think this demonstrates the potential to train more capable AI without unreasonable resources. Efficient training techniques = more powerful LLMs for the same resources. Full summary here. Arxiv paper: https://arxiv.org/pdf/2309.12307.pdf submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Machine learning jobs
    Hello, looking for job opportunities as a data scientist or machine learning engineer. Any help would be appreciated. Thanks!! submitted by /u/ArachnidFun2671 [link] [comments]  ( 8 min )
    [Research] Modelling tabular data with diffusion models (Blog post)
    Denoising diffusion probabilistic models are becoming the leading paradigm of generative modeling for many important data modalities. TabDDPM is a diffusion model that can be universally applied to any tabular dataset and handles any type of feature. Blog post link. submitted by /u/metkere [link] [comments]  ( 9 min )
    [P] Advice needed for what tool/algorithm is appropriate
    Context of the problem: I have the following entities: stations, programs, project manager, days, and time slots. The goal of the problem is to decide what program with who project manager to assign on what day at what time slot for each station. Some notes: Each program has its own duration. Say Program A last for 50 minutes, Program B is 30 minutes, etc. Each project manager has qualifications on what program he/she can handle. This constraint is a hard constraint. Time slots start from 6AM to 6PM. This means that if Program A which lasts for 30 minutes is assigned to start at 6:00AM, then it will end at 6:30AM. Only one program can be assigned in each station, so there should be no overlap in programs in terms of day and time per station. There is a forecasting model that takes a…  ( 10 min )
    [P] Introducing GeoCOCO: Easily transform GIS annotations into Microsoft's Common Objects In Context (COCO) datasets for use in deep learning
    https://github.com/jaspersiebring/geococo Introducing GeoCOCO, an open source project that enables users to turn their GIS annotations (e.g. shapefile) into COCO datasets which can then be used to train computer vision models! It allows users to use the likes of QGIS, ArcGIS to annotate geographic imagery in the same way you would annotate non-geographic imagery with LabelMe. It is powered by Python and a variety of packages (e.g. pydantic, pandera, geopandas, pycocotools). On the more meta side, it also features automated tests/builds/releases through Github Actions (using the likes of poetry, ruff, mypy, pytest, black). Sharing it with you guys in case someone else might find it useful! I am also very interested to hear some feedback (suggestions, flaws, etc.), let me know! ​ Here's…  ( 9 min )
    [D] Finding linkedIn article on anomaly detection
    Finding linkedIn article on anomaly detection Last week I saw a LinkedIn article on anomaly detection. In that post, Tail movement of a running mice on rotating rod ( roller) is captured with CV and plotted as a time series. Then based on the tail movement time series , anomalities in that time series are detected where mice loose the balance in rotating rod. I am trying to find this post in linkedIn but still didnt able get any clue. If you have seen this research article or have any clue please let me know. It will be great help. Appriciate you time and help.Thank You very much ! ❤️ [D] [R] submitted by /u/isurusachitha [link] [comments]  ( 9 min )
    [R] Parallelizing RNN over its sequence length
    I am really excited to share our newest work in deep learning: parallelizing RNN! https://arxiv.org/abs/2309.12252 RNN is thought to be non-parallelizable because of its inherent sequential nature: its state depends on its previous state. This makes training RNN for long sequence usually takes long time compared to other architecture classes (like CNN). What we present is an algorithm based on Newton's method to evaluate and train RNN in parallel. In one of our experiment, we can achieve >1000x faster evaluation of a small GRU compared to common sequential method on a very long sequence. Training RNNs with our algorithm could also be more than 10x faster than training with sequential method. However, it's not without flaws. There are 2 major drawbacks we noticed: non-convergence and sca…  ( 10 min )
    [D] Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM?
    Assuming using the same cloud service, Is running an open sourced LLM in the cloud via GPU generally cheaper than running a closed sourced LLM? (ie. do we pay a premium when running a closed sourced LLM compared to just running anything on the cloud via GPU?) One eg. I am thinking of is running Llama 2 13b GPTQ in Microsoft Azure vs. GPT-3.5 Turbo. I understand there are a lot of parameters to consider (such as choosing which GPU to use in Microsoft Azure etc.), but I am really looking at what’s the cheapest way to run Llama 2 13b GPTQ or a performance-equivalent closed sourced LLM. submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [P] State-of-the-art Image-to-Image generators (open-source)
    Hi all. I am curious to know what is cutting edge in this domain? The use-case is creating an image of myself that is older than I am. I've looked a bit into StarGAN but I've never done work in this domain and don't know if this is still used today. Any help would be appreciated! submitted by /u/ProudOwner_of_Fram [link] [comments]  ( 9 min )
    Gradient of Langevin Dynamics Step w.r.t model parameters [D]
    I am reading the following paper about self-supervised learning. ​ https://preview.redd.it/g4bz7wjxarpb1.png?width=1323&format=png&auto=webp&s=9f923825b57d4d3da346bb1bb4dd008366493dbd Briefly their idea for self-supervised learning is to reconstruct a corrupted image (e.g., random masking) using Langevin step of a learned energy function and the reconstructed image is compared to the clean image for supervision. i.e., ​ https://preview.redd.it/q9e1pd9yarpb1.png?width=953&format=png&auto=webp&s=528e654d3ae4db1435aa3c11e5edc787876fd9ac This should work because going along the decreasing energy value ensures that corrupted images have high energy and clean images have low energy. Everything made sense to me until I tried to implement it. ​ In order to optimize the parameters (\theta), we have to take the gradient of the loss w.r.t \theta and by the chain the chain rule we will compute the gradient of the reconstructed x (\tilde x) w.r.t \theta. ​ https://preview.redd.it/j3qbyddzarpb1.png?width=1280&format=png&auto=webp&s=c03d54dbaf70ba588c2a4d87691d71184797ae3f Is this even correct? What am I even talking about I am confused. Anyways ... They provided the following PyTorch pseudocode and I have provided the actual PyTorch code. ​ https://preview.redd.it/m4wd7pq2frpb1.png?width=1266&format=png&auto=webp&s=d5febf258f0c92fc90116a996367b605cd597128 Actually the model parameters never change no matter what what the values of step size (alpha) or the learning rate are. I am missing something? ​ https://preview.redd.it/yznnuws9frpb1.png?width=1185&format=png&auto=webp&s=1ced5a213d372f309e6c20a9781ac35c1abcc436 Any help is appreciated ​ submitted by /u/ThoughtOk5558 [link] [comments]  ( 9 min )
  • Open

    Improving your LLMs with RLHF on Amazon SageMaker
    In this blog post, we illustrate how RLHF can be performed on Amazon SageMaker by conducting an experiment with the popular, open-sourced RLHF repo Trlx. Through our experiment, we demonstrate how RLHF can be used to increase the helpfulness or harmlessness of a large language model using the publicly available Helpfulness and Harmlessness (HH) dataset provided by Anthropic. Using this dataset, we conduct our experiment with Amazon SageMaker Studio notebook that is running on an ml.p4d.24xlarge instance. Finally, we provide a Jupyter notebook to replicate our experiments.  ( 11 min )
  • Open

    Doing graph + tabular analytics directly on modern data lakes
    A podcast with Weimo Liu and Sam Magnus of PuppyGraph Open source Apache Iceberg, Hudi and Delta Lake have made it possible to dispense with the complexities and duplication of data warehousing. Instead of requiring time-consuming extract, transform and load (ETL) procedures, these large table formats make it simple to tap S3 and other repositories… Read More »Doing graph + tabular analytics directly on modern data lakes The post Doing graph + tabular analytics directly on modern data lakes appeared first on Data Science Central.  ( 19 min )
    AI in finance: Addressing hurdles on the path to transformation
    Discover the obstacles hindering seamless AI adoption in financial services and gain actionable insights to navigate regulatory compliance, data security, organizational change, and more. The post AI in finance: Addressing hurdles on the path to transformation appeared first on Data Science Central.  ( 22 min )
  • Open

    Primes, weeds, and military precision
    Here’s a quote from Don Zagier that I found in Larry Rolen’s lecture notes on modular forms. There are two facts about the distribution of prime numbers of which I hope to convince you so overwhelmingly that they will be permanently engraved in your hearts. The first is that, despite their simple definition and role […] Primes, weeds, and military precision first appeared on John D. Cook.  ( 5 min )
    Continued fractions as matrix products
    A continued fraction of the form with n terms can be written as the composition where As discussed in the previous post, a Möbius transformation can be associated with a matrix. And the composition of Möbius transformations is associated with the product of corresponding matrices. So the continued fraction at the top of the post […] Continued fractions as matrix products first appeared on John D. Cook.  ( 5 min )
    Fractional linear and linear
    A function of the form where ad – bc ≠ 0 is sometimes called a fractional linear transformation or a bilinear transformation. I usually use the name Möbius transformation. In what sense are Möbius transformations linear transformations? They’re nonlinear functions unless b = c = 0. And yet they’re analogous to linear transformations. For starters, […] Fractional linear and linear first appeared on John D. Cook.  ( 6 min )
  • Open

    Driving where no Autonomous Vehicle has driven before!
    submitted by /u/shani_786 [link] [comments]  ( 8 min )
    The conference for Reinforcement Learning related with mutli-agent system, game theory, or with others' technicals
    Hi everyone, I'm a newbie for the domain of reinforcement learning (RL). My main research on various software systems, mainly on multi-agent system (MAS). AAMAS, a conference focusing on Multi-agent system, I think is the good conference to study in MAS. Recently, there are many researchers for MAS that is great combination with reinforecement learning to do some interesting things. Also, game theory is a interesting knowledge what I'm curious about how to do within RL and agents. Does any recommended conference or journal for RL? don't mind for just only RL algorithm, that's enough great, whereas I prefer to the conference concerning RL applied some software system for some problem. Thanks all. submitted by /u/DryAir1198 [link] [comments]  ( 9 min )
    help me solve this weird error
    trying to make a frozen lake game but keep getting this weird error and i dont know how to fix it. The error is "too many values to unpack (expected 4)" on the line: next_state, reward, done, _ = env.step(action) import gym import numpy as np import pygame from pygame.locals import QUIT import tensorflow as tf from tensorflow import keras import warnings def ignore_specific_warning(): warnings.filterwarnings("ignore", message="This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.") ​ ignore_specific_warning() ​ input_size = env.observation_space.n output_size = env.action_space.n ​ model = tf.keras.Sequential([ tf.keras.layers.Dense(64,activation='relu',input_shape=(input_size,),use_bias=False), tf.keras.layers.Dense(output_size,use_bias=False)]) ​ loss_fns = tf.keras.losses.MeanSquaredError() optimizer = tf.keras.optimizers.Adamlearning_rate=0.001 ​ discount_factor=0.99 learning_rate=0.1 num_episodes=1000 ​ for i in range(num_episodes): state = env.reset() done = False while not done: env.render() epsilon = 0.1 if np.random.rand() < epsilon: action = env.action_space.sample() else: #Q_values = model.predict(tf.one_hot(state[0],input_size)) #action = np.argmax(Q_values) state_one_hot = tf.one_hot(state[0], input_size) Q_values = model.predict(tf.reshape(state_one_hot, (1, -1))) action = np.argmax(Q_values) next_state, reward, done, _ = env.step(action) target = reward*discount_factor*np.max(model.predict(tf.one_hot(next_state,input_size))) with tf.gradientTape() as tape: Q_values = model(tf.one_hot(next_state,input_size)) loss = loss_fn(Q_values[0][action],target) gradients = tape.gradients(loss,model.trainable_variables) optimizer.apply_gradients(zip(gradients,model.trainable_variables)) state = next_state env.close() submitted by /u/BeastHunterrr [link] [comments]  ( 9 min )
  • Open

    Why Open Source AI Will Win
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Help answering questions about neural networks for my father(retired Computer Science and EE professor) who (for medical reasons) cannot post them himself.
    My father, a retired Computer Science professor and pioneer in network design algorithms, has asked me to post two questions he has as he attempts to teach himself about neural networks. For physical/medical reasons he can’t post them himself. Grateful for any answers or suggestions for where he could find them! His note: I have many questions, but they all revolve around two issues. The first is when the weights in the neural net converge, do they exhibit any recognizable pattern. I realize that when the number of nodes and levels in the net are large it may be impossible to recognize the pattern. But suppose that the problem being examined is a simple categorization and the number of nodes in the net is small, is it possible to see a pattern in the weights? The network is supposed to …  ( 10 min )

  • Open

    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    I'm making my capstone project at Harvard freely available for everyone (until credit runs out). It uses ChatGPT3/4 in the backend. Experimenting with the finetuned model now.
    submitted by /u/Raymondlkj [link] [comments]  ( 8 min )
    Help bringing some peace to my family.
    I am looking for a way that I can take a bunch of saved voicemails from my mom and be able to hear her voice again. It would mean the world to my family and if my kids could hear her voice I know it would brighten their day. Can anyone point me in the right direction to accomplish this? submitted by /u/blbjtb [link] [comments]  ( 9 min )
    Don’t exclude AI-generated art from copyright
    The US Copyright Office has ruled that AI-generated art cannot be copyrighted, raising questions about whether AI-generated art should be excluded from copyright protection. The Copyright Office argues that AI-generated art is a 'merely mechanical' process with no novelty, invention, or originality, and therefore does not deserve copyright protection. Critics, however, argue that this approach is not scalable and fails to consider the creative choices made by AI artists. They suggest that AI-generated art should be treated similarly to photography, where even though the image is captured mechanically, it still reflects the creative choices of the photographer and is eligible for copyright protection. Photographers are able to own the copyright in their photographs because they make creative judgments about where to point the camera, when to snap the image, and how to adjust camera settings. Similarly, AI artists explore the 'latent space' of images that AI software can produce, making creative judgments about which images to select and explore. While the actual image is produced by the software, the important point is that a human being is making creative decisions about the art. Therefore, critics argue that AI-generated art should be eligible for copyright protection, as it reflects the creative choices and judgments of the AI artist. Source : https://www.understandingai.org/p/dont-exclude-ai-generated-art-from submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Looking for a Free Chatbot Service – Custom Base Prompt and API Access Needed
    I'm looking for a free chatbot service that allows base prompt customization and offers API access (so I can dynamically change the stock on hand to better assist customers, for example). I looked into https://ora.ai, but it seems you can only set the base prompt once and manually. I'm hoping to find a service that allows for more flexibility. The service must also be free and offer embedding, like https://ora.ai/, (with div) submitted by /u/LimeLom1 [link] [comments]  ( 9 min )
    Leading Theory of Consciousness (and why even the most advanced AI can't possess it) Slammed as "Pseudoscience"
    Consciousness theory slammed as ‘pseudoscience’ — sparking uproar (Nature) The irony here is that I mostly agree with this theory - but the article reflects how little we really know about consciousness and how it works, and how what's considered the "expert opinion" that AI can't possess consciousness is arguably influenced more by popularity than real empirical evidence. By whatever mechanism, they can respond to their treatment in unexpectedly humanlike ways. Oh, and by the way, did you think that "sentient Bing" was finally dead? Think again. submitted by /u/kamari2038 [link] [comments]  ( 9 min )
    The AI Revolution is Rotten to the Core
    submitted by /u/Hazzman [link] [comments]  ( 8 min )
    Best VoicifyAi alternatives?
    I was thinking of signing up to VoicifyAi for fun a create some custom covers but are there better (or even free) alternatives? submitted by /u/quantummufasa [link] [comments]  ( 8 min )
    [I read the paper for you] LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3
    Edit: FLAC is the tested audio extension, not MP3 I read the new paper from DeepMind so you don't have to. Here are the key highlights: Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac. Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size. There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models. Tokenization (like BPE) generally hurts compression slightly by making prediction harder. Longer contexts let models exploit more sequential dependencies. Implications: Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically. I got some new perspective on model scaling laws and links between prediction and generalization. There's potential for practical applications compressing images, video etc. But large model size an issue. Overall it shows these models are very capable general purpose learners, not just for language. Full summary here if you want more details. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    New AI Art Style
    This AI-generated art style is gaining a lot of traction on the internet. So I wanted to share how you can create art like this in under 2 minutes: • Visit the Fusion Art website and sign up for free (https://quickqr.art/app/fusion-art) • Choose a template or upload your reference template image. • Describe your image to tailor the output. • Tweak the aspect ratio and explore added features. • Hit "Generate" That's it! Share your art to showcase what you come up with. Hope this was helpful for anybody looking to create this style of art! https://preview.redd.it/j3o9puk9smpb1.png?width=1536&format=png&auto=webp&s=ddbe1328bd19dc89c4cd82ef1870b9de695e5500 https://preview.redd.it/9ppxjpeasmpb1.png?width=768&format=png&auto=webp&s=29dd845eb5f8f094abe5bc1135060965f51365cf https://preview.redd.it/58vjky3bsmpb1.png?width=1536&format=png&auto=webp&s=06db1926e021ea3f2c438fb723c453758a461c43 https://preview.redd.it/w2nvaoobsmpb1.png?width=1024&format=png&auto=webp&s=0b4bcaea4fc7c832bbce9f7f3b9319099df59a14 submitted by /u/IndifferentSpectat0r [link] [comments]  ( 9 min )
    Now that DALL-E 3 is getting integrated with ChatGPT, will you switch from Midjourney and others?
    submitted by /u/Vinitneo [link] [comments]  ( 8 min )
    I used Riffusion to generate an AI saxophonist to jam with me, responding to what I played on guitar
    submitted by /u/daveNZL [link] [comments]  ( 8 min )
    Mastering Prompt Engineering: Best Prompt Pattern to Use.
    Hello, Reddit! At its core, Prompt Engineering is all about commanding the conversation with AI. Wanna write a poem, draft a project plan, or break down complex scientific terms? The right prompt can do that for you. If you want to learn more about Prompt Engineering feel free to join r/PromptWizards. Let's dive into some Prompt Engineering concepts: Let's start with the basics: Reductive operations: These yield responses that are smaller in size than the input prompt. Essentially, they distill information. Example: If the input prompt is a long article, and the task is to provide a summary, the operation used is reductive as the output (summary) is smaller than the input (the entire article). Transformational operations: These make sure the output maintains a relative balance w…  ( 10 min )
    ChatGPT/Bard/Claude one paragraph challenge: "In one paragraph, what is the best use of AI for humanity?" You be the judge
    ChatGPT Plus: The best use of AI for humanity is to augment human capabilities, address pressing global challenges, and improve the quality of life for all. By analyzing vast amounts of data at unprecedented speeds, AI can aid in medical research, predict and mitigate natural disasters, optimize energy consumption, and personalize education to individual needs. Moreover, it can provide solutions to complex problems, such as climate change, resource scarcity, and public health crises, ensuring a more sustainable and prosperous future for generations to come. Bard: The best use of AI for humanity is to use it to solve problems that will make the world a better place for everyone. AI has the potential to improve our lives in many ways, such as developing new and more effective ways to diag…  ( 10 min )
    What do you think of open-source AI?
    Hugging Face CEO Clem Delangue says open-source AI was vital to starting his company. Now, he finds himself defending open AI models as Washington considers new regulations. On POLITICO Tech, Delangue explains why he views open-source AI as not only safe, but necessary to prevent big tech companies from gaining more market power. Listen for more: https://politico-tech.simplecast.com/episodes/the-hugging-face-case-for-open-ai submitted by /u/smo279 [link] [comments]  ( 9 min )
    No idea which ai to use
    I have a product and I need product photos. It is a bottle and I am wanting ai to make a realistic picture of a woman holding the bottle and smiling. Can this be done? If so, what software/website/app do I use for this? Thank you submitted by /u/Ok_Salt_9211 [link] [comments]  ( 8 min )
    How to use ChatGPT to increase your website conversions
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    75% of Americans Believe AI Will Reduce Jobs
    75% of Americans believe that AI will decrease the total number of jobs in the next 10 years, according to a survey by Bentley-Gallup Business in Society study. Younger Americans are less pessimistic about AI compared to older age groups. Majorities believe that AI performs as well as or better than humans in customizing online content, recommending products or services, and assisting students with coursework. However, Americans are skeptical about AI's ability to recommend employees, its self-driving capabilities, and its ability to recommend medical advice. 79% of Americans have little trust in businesses to use AI responsibly. 40% of Americans think AI does more harm than good, while only 10% believe it contributes more good than harm. Black and Asian Americans have a more positive view of AI's impact on society compared to Hispanic and White Americans. While most Americans are wary of AI's impact on the job market, younger people are more optimistic about its future. Businesses need to affirm their commitment to using AI responsibly and address the knowledge deficit and lack of confidence among Americans. Source : https://news.gallup.com/opinion/gallup/510635/three-four-americans-believe-reduce-jobs.aspx submitted by /u/NuseAI [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/20/2023
    OpenAI unveils DALL-E 3, allows artists to opt out of training.[1] Infosys, the Indian tech giant, has announced a strategic partnership with NVIDIA, a leading provider of enterprise-grade AI solutions. The collaboration aims to empower enterprises and businesses with generative AI-based solutions that drive productivity.[2] Alibaba and Tencent Invest in Chinese State-Backed Zhipu AI.[3] John Grisham, George R.R. Martin and more authors sue OpenAI for copyright infringement.[4] Sources: [1] https://techcrunch.com/2023/09/20/openai-unveils-dall-e-3-allows-artists-to-opt-out-of-training/ [2] https://gameishard.gg/news/infosys-and-nvidia-partner-to-deliver-generative-ai-solutions/205456/ [3] https://winbuzzer.com/2023/09/20/alibaba-and-tencent-invest-in-chinese-state-backed-zhipu-ai-xcxwbn/ [4] https://apnews.com/article/openai-lawsuit-authors-grisham-george-rr-martin-37f9073ab67ab25b7e6b2975b2a63bfe submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    There are no specific license dedicated to artificial intelligence that prevent them from extensively scraping publicly available data on the internet without providing proper source attribution
    When AI takes all the data for training purposes without feedback to UGC platforms, could this eventually lead to the demise of UGC platforms, and could the internet become increasingly closed until it collapses? submitted by /u/oodzchen [link] [comments]  ( 9 min )
    Canva AI Blurred My Image
    Asked the new Canva AI to blur my image. It Blurred my image. submitted by /u/MDINOKC [link] [comments]  ( 8 min )
  • Open

    Naming Awk
    The Awk programming language was named after the initials of its creators. In the preface to a book that just came out, The AWK Programing Language, Second Edition, the authors give a little background on this. Naming a language after its creators shows a certain paucity of imagination. In our defense, we didn’t have a […] Naming Awk first appeared on John D. Cook.  ( 4 min )
  • Open

    [D] What fundamentally prevents training with Volunteer Computing?
    I’d love to see people come together, donate their spare compute to train more open source models, boost research etc. One relevant paper I was able to find is this one https://arxiv.org/abs/2103.08894 submitted by /u/tecbar [link] [comments]  ( 8 min )
    AAAI24 fast track submission [D]
    I'm planning on submitting a rejected NeurIPS paper to the AAAI fast track. A few days ago I registered myself as an author on CMT, but I cannot see an option to submit a paper. Will the portal open later? Can anyone else see an option to submit yet? submitted by /u/Firm-Act-3860 [link] [comments]  ( 9 min )
    [D] Can you go to NeurIPS workshops without presenting?
    This year will be my first NeurIPS, and I see some cool workshops but don't have any work that would be a good fit for them. Can I just go and listen? submitted by /u/ThickBiker [link] [comments]  ( 8 min )
    [D] Motherboard Help
    So a few weeks ago I purchased a Sega 36 Crane Machine (Claw Machine) off ebay and everything worked great except the claw strength being too strong. I took the board out so I could possibly have something added to the board for me. It was missing a “Free Play” button which is used to adjust claw strength on this machine.. Anyway, I kept the board in the garage while searching for the right parts and dad threw it away on accident while cleaning, now im out a board and I have no idea what to do. Can anybody help me or am I sol? Even if im in the wrong community for this, a step to the right direction would even help. I also have attached some pictures of the board.. Im not good with these boards at all and have no clue what im looking for. Theres also a diagram of the boards functions! Thanks for all information/help! submitted by /u/Ready_Highlight9758 [link] [comments]  ( 9 min )
    Open-Source Trading AI [P]
    Hi, I am an experienced trader and coder. I am embarking on a journey to develop an open-source Trading AI in Python, and I'm looking for passionate individuals to join me in this project. This is a non-commercial, community-driven project, so there won't be any monetary compensation involved. However, it's a great opportunity to learn and collaborate in the field of trading AI. ​ - You don't need to know trading to contribute. - As an Open Source project, you have complete freedom to use the AI. ​ About the Project: - Objective: Our goal is to create a self-learning AI system for trading in financial markets. - Tech Stack: We'll be working with Python for data analysis, machine learning, and neural network development. - Data: We'll be using historical stock market data to train a…  ( 9 min )
    [P] MLE RecSys / Ops at DeoVR
    Hi, we're welcoming talented Machine Learning Engineers with a focus on Recommendation Systems and ML Operations. Who are we? Leading in VR video streaming, we drive the DeoVR platform with interactive 8K videos, ML, computer vision, and advanced haptic tech. What you'd do? For MLE RecSys: develop ML pipelines for top recommendations, engage with community for desired features, and collaborate with Backend/Frontend/Unity teams. For ML Ops: construct ML inference infrastructure, refine models & workflows, and ensure high-availability ML services. What we offer? 🌍Remote flexibility 🤝Collaborative and inclusive work environment 🚀Make a significant impact in the VR industry Interested? Check out and apply here or drop me a DM😎 submitted by /u/SanjaVR [link] [comments]  ( 9 min )
    [D] Where to find checkpoints for models (with different seeds) trained on the Imagenet?
    i.e., models with the exact same architecture, but their initial weights are different, and the order which the model sees the training data is different. submitted by /u/just2gud [link] [comments]  ( 8 min )
    [P] BlindChat: Fully in-browser and private Conversational AI with Transformers.js for local inference
    We are happy to share with you BlindChat, the open-source and privacy-by-design alternative to ChatGPT for a fully in-browser, yet private, Conversational AI! You can play with it on our Gradio demo. Our philosophy is that Conversational AI should be easily accessible, and privacy should not be a luxury but a given. By leveraging local models running in the browser, with transformers.js, we make it possible to have a fully transparent and private AI that works on your browser without any extra setup. Because all the logic is offloaded to users’ devices, data never leaves and there is no risk of it being used for finetuning by third parties. For now, we only support inference with LaMini-Flan-T5, so you might see modest performance. We plan to integrate Microsoft phi-1.5 for better performance once the 370M is out. We are also working on LlamaIndex-TS integration on the client side to have in-browser RAG for local querying of private documents. As our roadmap is quite dense, with RAG, internet search, improved inference, we welcome warmly contributors! If you want to contribute, or have questions, ping us on Discord and GitHub! submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [R] DeepMind: LLMs compress images 43% better than PNG, and audio nearly 2x better than MP3
    Edit: FLAC is the tested audio extension, not MP3 I read the new paper from DeepMind so you don't have to. Here are the key highlights: Despite training on text, langauge models compressed images 43% better than PNG, and audio nearly 2x better than flac. Confirmation of scaling laws - bigger models compressed better. But model size must match dataset size. There are tradeoffs between model scale, data size, and compression performance. More data enables bigger models. Tokenization (like BPE) generally hurts compression slightly by making prediction harder. Longer contexts let models exploit more sequential dependencies. Implications: Models have learned very general capabilities beyond just text. Their strong compression reflects deep understanding of images, audio etc statistically. I got some new perspective on model scaling laws and links between prediction and generalization. There's potential for practical applications compressing images, video etc. But large model size an issue. Overall it shows these models are very capable general purpose learners, not just for language. Full summary here if you want more details. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    [D] Stanford's AI Professional Program
    Hi, I am interested in taking couple of AI/ML online courses from the Stanford's Artificial Intelligence Professional Program (https://online.stanford.edu/programs/artificial-intelligence-professional-program). I am interested in taking XCS221 and XCS229. My employer would be paying for this (I hope!). I know these are not the complete courses offered in the university. Has anyone taken courses from this program ? Is it worth it ? submitted by /u/RealMadrista007 [link] [comments]  ( 9 min )
    [N] OpenAI's new language model gpt-3.5-turbo-instruct can defeat chess engine Fairy-Stockfish 14 at level 5
    This Twitter thread claims that OpenAI's new language model gpt-3.5-turbo-instruct can "readily" beat Lichess Stockfish level 4. This tweet shows the style of prompts that are being used to get these results with the new language model. I used website parrotchess[dot]com (discovered here) to play multiple games of chess purportedly pitting this new language model vs. various levels of Fairy-Stockfish 14 at website Lichess. My current results for all completed games: The language model is 2-0 vs. Fairy-Stockfish 14 level 5 (game 1, game 2), and 0-2 vs. Fairy-Stockfish 14 level 6 (game 1, game 2). One game I aborted because the language model apparently tried an illegal move. The following is a screenshot from the aforementioned chess web app showing the end state of the first game vs. Fai…  ( 9 min )
    [D] What would be your approach if you were to write a Tree-of-Thoughts model?
    I've been playing around with transformers since following Karpathy's shakespeare model on youtube and found it really cool. Thought I should write a Tree-of-Thoughts model as my next project which will be somewhat similar, given what we're doing is learning the meaning behind words and phrases and then using that knowledge to generate a sequence based on a seed token or sequence of tokens. Here's how I was thinking it should go: I write a next token in sequence prediction model and write a generate function that generates a sequence of tokens on top of a seed token. Then I write a tree model that takes in whatever this bigram has learned and adds onto it some kind of knowledge about where one thought about the seed has been generated and stop there. Then iteratively generate a few more such thoughts based on the same seed. This is how I initially thought it would go but now when I'm writing, I've already written the bigram(next token prediction and generation), I don't know what to do next. I am kind of stuck. It will be interesting to see how you, if you've ever written a similar model or are good at NLP, would attempt this. Looking at a variety of different approaches to the same problem I am trying to solve would be great. So I decided to post here. Would be great to hear from you guys! submitted by /u/GraphicsMonster [link] [comments]  ( 9 min )
    [N] Toyota Research Institute Unveils Breakthrough in Teaching Robots New Behaviors
    Press release Diffusion Policy: Visuomotor Policy Learning via Action Diffusion (contains link to paper) Comments on Hacker News with some interesting info / links. submitted by /u/falconberger [link] [comments]  ( 8 min )
    Looking for this particular dataset on Cervical cancer [R]
    Hi all, I have found this dataset on Kaggle with zero information about it. It is very intriguing and would like to find the original researchers about it. It contains over 270,000 histopathological images of cervical cancer. I cannot contact the person who posted it on Kaggle because I am not a contributor yet. And they are not replying on other platforms. Any help on finding it will be very much appreciated. If not, where else can I get such large cervical or bladder cancer datasets? Link to the dataset mentioned: https://www.kaggle.com/datasets/rzelite/cervical-cancer Thanks in advance. submitted by /u/dumb_persn [link] [comments]  ( 9 min )
    [N] GitHub CEO interview regarding AI and programming
    https://preview.redd.it/u7lq37efllpb1.png?width=1029&format=png&auto=webp&s=824ee138d9ae4b28d1600b969eab2077f47fb2a6 GitHub CEO Thomas Dohmke spoke on stage at TC Disrupt today and made several statements regarding the development of artificial intelligence and programming: Dohmke believes that AI won't replace software developers but will make them more efficient. Despite AI advancements, the demand for software developers will continue to exceed the supply. The growth in software usage is expected to be exponential, with every company becoming a software company. Legacy code maintenance remains crucial, especially in industries like finance with outdated codebases. Generative AI is creating more demand for technical talent as companies seek to adopt innovative AI solutions. There is a shortage of computer science students, leading to increased demand for developers with AI skills. Source: https://techcrunch.com/2023/09/20/github-ceo-despite-ai-gains-demand-for-software-developers-will-still-outweigh-supply/ submitted by /u/gcore-com [link] [comments]  ( 9 min )
    [D] TimeGAN - doubt on generated sequence
    Hello everyone I have a doubt regarding GANs for tabular data, more specifically for time-series generation. Looking at the TimeGAN paper (code) and other implementations such as the one by YData, considering that the training dataset is divided into slices (like a rolling window) of N length and that the generated sequence will follow the same format, that is [batch size, N, n_features], what we are effectively generating are slices and not a fully synthetic time-series. ​ To clarify my question, the output of a TimeGAN is only a set of slices, although there is no guarantee that generated slice(N-1) is the one before slice(N), and so the true use of a TimeGAN is generating slices that can be used to train a predictive TS model but not reconstructing a complete time series like the one used for training. ​ I suspect that's the case (useful only to generate good slices for training predictive models) as there is no fully generated time series in the paper or any of the codes, while the quality of the generated data is partially measured by training a model on original data and generated data and comparing their predictive performance. If anyone can confirm/develop upon these thoughts I would be extremely grateful. submitted by /u/iReallyReadiT [link] [comments]  ( 9 min )
    [P] Quick fine-tuning image classification models from Bing image search
    I've been working on our interactive dataset explorer for machine learning: renumics-spotlight. Recently, I set up an image classification example using it. I utilized Bing image search to create a fully functional example for custom-defined classes. To my surprise, it not only worked efficiently but also took only a few minutes for fine-tuning. The best part is its adaptability. You can easily switch it up for different image classes. Maybe you like it. Let me know what you think or if you have any suggestions to improve! Install with pip install renumics-spotlight sliceguard[all] Imports from renumics import spotlight from sliceguard.data import create_imagedataset_from_bing from sliceguard.models.huggingface import finetune_image_classifier, generate_image_pred_probs_embeddings from sliceguard.embeddings import generate_image_embeddings ​ Create an Image Dataset from Bing class_names = [ "Blue Tang", "Clownfish", "Spotted Eagle Ray", "Longnose Butterfly Fish", "Moorish Idol", "Royal Gramma Fish", ] df = create_imagedataset_from_bing( class_names, 25, "data", test_split=0.2, license="Free to share and use" ) Fine-tune a ViT Model with the data (in 1-2 minutes on a GPU) finetune_image_classifier( df[df["split"] == "train"], model_name="google/vit-base-patch16-224-in21k", output_model_folder="./model_folder", epochs=15, ) Enrich the DataFrame with Predictions, Probabilities and Embeddings and visualize it: df["prediction"], df["probs"], df["embeddings"] = generate_image_pred_probs_embeddings( df["image"].values, model_name="./model_folder" ) # Check the result and detect problematic clusters spotlight.show( df, layout="https://spotlight.renumics.com/resources/image_classification_v1.0.json" ) ​ https://i.redd.it/20qy5xw62lpb1.gif submitted by /u/DocBrownMS [link] [comments]  ( 9 min )
    [D] finetuning quantized model is a bad idea?
    Hi there, due to the lack of my resources, I have to use quantized big-model or something lighter. In this situation, I want to try the first option, and goona finetune some model. How do you expect the result? Training huge model in 4-bits circumstance will be significantly different from original setting? Thanks. submitted by /u/Mundane_Definition_8 [link] [comments]  ( 9 min )
    [D] UNCERTAINTY IN GRADIENT BOOSTING VIA ENSEMBLES
    Paper: https://doi.org/10.48550/arXiv.2006.10562 Hi all, This paper explores the use of using a single model (meaning an ensemble of trees) to generate uncertainty. This technique has been implemented into catboost. My question is why hasn't this been implemented into xgboost? The technique looks easily applicable but I would have expected it to be implemented already as it is 2 years old. Is this for some reason not applicable to Xgboost? ​ Figure 1 from paper showing the 'virtual' ensemble submitted by /u/MetalOrganicKneeJerk [link] [comments]  ( 9 min )
    Context Vectors Embedding [R]
    Previously I was spoiled my LangChain. Suppose I have a bunch of PDFs that I want to store as context vectors. And I want to use an open-source LLMs. Without using LangChain, how do I generate the context vectors? (I will store it in vector databases) submitted by /u/stephenhky [link] [comments]  ( 9 min )
    [N] OpenAI Announced DALL-E 3: Art Generator Powered by ChatGPT
    For those who missed it: DALL-E 3 was announced today by OpenAI, and here are some interesting things: No need to be a prompt engineering grand master - DALL-E 3 enables you to use the ChatGPT conversational interface to improve the images you generate. This means that if you didn't like what it produced, you can simply talk with ChatGPT and ask for the changes you'd like to make. This removes the complexity associated with prompt engineering, which requires you to iterate over the prompt. Majure improvement in the quality of products compared to DALL-E 2. This is a very vague statement provided by OpenAI, which is also hard to measure, but personally, they haven't failed me so far, so I'm really excited to see the results. DALL-E 2 Vs. DALL-E 3, image by OpenAI From October, DALL-E 3 will be available through ChatGPT and API for those with the Plus or Enterprise version. And there are many more news! 🤗 I've gathered all the information in this blog 👉 https://dagshub.com/blog/dall-e-3/ submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
  • Open

    Distilling step-by-step: Outperforming larger language models with less training data and smaller model sizes
    Posted by Cheng-Yu Hsieh, Student Researcher, and Chen-Yu Lee, Research Scientist, Cloud AI Team Large language models (LLMs) have enabled a new data-efficient learning paradigm wherein they can be used to solve unseen new tasks via zero-shot or few-shot prompting. However, LLMs are challenging to deploy for real-world applications due to their sheer size. For instance, serving a single 175 billion LLM requires at least 350GB of GPU memory using specialized infrastructure, not to mention that today's state-of-the-art LLMs are composed of over 500 billion parameters. Such computational requirements are inaccessible for many research teams, especially for applications that require low latency performance. To circumvent these deployment challenges, practitioners often choose to deplo…  ( 93 min )
  • Open

    Ddqn tunning
    Hi world I'm trying to implement dqn and ddqn on various playgrounds with Matlab, from scratch. I use dqn and ddqn with replay buffer. I tried to used gradient clipping. But I find it very difficult to tune. I mean I tuned and programmed successfully various algorithm such as different GAN that are not necessarily very easy to tune. RL algorithm seems even worse. I tried several grid world problems, cart pole and even a very simple second order system regulation (integrator + 1 st order pôle). Even the Matlab cart pole demo with ddqn does not provide a very satisfactory result (after training, the contrôler keep the pendulum in uprigth position, but the cart slowly drift). On the second order system, the learning seems ok for about 200 épisode(test shows that the expected behavior is almost learnt) and then suddenly the average episode score has a kind of inflection and everything blowns up, despite a very large replay buffer. So my question : is it possible to make dqn and ddqn work well with a reasonable tuning session length? Are policy optimization easier to tune (and/or more efficient)? So far, it seems to me that these action-vzlue based algorithms are highly unstable and the training may only works for a 'miraculous' tunning. What are you thought? submitted by /u/seb59 [link] [comments]  ( 9 min )
    Implementation of Reinforcement Learning to achieve an autonomous drone
    Hello there, I'll provide a quick introduction first. I am a mechatronics engineer student who is graduating this semester, I have been also privately studying ML, DL, CV for the past year because that's when I decided that I want to make an autonomous drone as my thesis/graduation project and oh my god, little did I know. I have aquired so much knowledge in the data science field and truth be told I love it, it engages my mind like crazy so I've decided that i would like my career to be revolved around electronics and programming/ data science. Back to the topic, I am working on object detection (I got it figured out), and reinforcement learning, so what I would like to do is to train the model on reaching it's destination using collision avoidance, I have a PIX4, RPI 4B 4GBs since I'm also building my drone from scratch, so let's say I have all the hardware which ofcourse includes the GPS + Compass and 5 ultrasonic sensors, 1 at each side and 1 down to hold the altitude accurately. I was thinking of HITL, making a virtual environment to train my model there because this is what logic says, ofcourse( I could also fly it using the RC controller and for example test if the obstacle avoidance is going to overrule the manual drive, which is also my objective). The question is: how can I make sure virtual drone is going to be equipped with the real-life sensors and in the needed positions? How can I feed all the sensors data during the training? Which type of reinforcement learning models should I implement? I just feel like I dragged myself into an incredibly amazing and complex project which is going to end up fucking my semester and I won't be able to graduate, it's just that my stress now is skyrocketing. Please feel free to throw any advice or opinions my way, and thank you for keeping up with this long post. 😁 submitted by /u/Gabii99 [link] [comments]  ( 10 min )
    Cost function for a deep q network
    I am using Game Maker and I just wanted to check if this is how to do the cost function. ​ loss_function = 0 derivative_gradient = 0 array_copy(global.main_inputs,0,global.inputs,0,array_length(global.inputs)) for (var i = 0; i < array_length(buffer_sampling); i++) { var _reward = buffer_sampling[i][2] global.inputs = buffer_sampling[i][3] var _max_q = forward_prop_t()*global.gamma var yi = _reward + _max_q var cur_q_value = buffer_sampling[i][1][0] loss_function += power((yi - cur_q_value),2) derivative_gradient += 2*(cur_q_value - yi) } array_copy(global.inputs,0,global.main_inputs,0,array_length(global.main_inputs)) derivative_gradient = derivative_gradient*(1/array_length(buffer_sampling)) global.cost = loss_function*(1/array_length(buffer_sampling)) ​ basically, buffer_sampling is an array with tuples of [first_state, [q_value,action], reward, next_state] Thanks for any help! submitted by /u/Daninjacat256 [link] [comments]  ( 9 min )
    Building intuition on AC algo with TD
    I am trying to build intuition on why the actor update equation actually help the actor improving its policy. From what I understand, the update is proportional to the Critic's TD error. A better than expected reward will lead the actor to increase the prob of taking the associated action, and a disappointing reward will make the actor to move away from the given action. Here's where I feel like I'm missing a piece of the puzzle. Between a good state/action pair accurately valued by the Critic and a bad state/action pair undervalued by the Critic, the actor's update will favor the bad action more. Could we not conceive a scenario in which the agent gets stuck in a suboptimal policy because of that particular behavior? What triggers this questioning is that I have implemented a standard AC algo for Pixelcopter and found my agent getting stuck easily on a suboptimal policy (i.e. constantly getting higher, or lower until crash, while the critics assign good values on center squares and bad values to squares close to the wall's edge). For TD0, this is pretty marginal, but gets more significant for n-step TD as n increases. Any thoughts? submitted by /u/infundibuliforme [link] [comments]  ( 9 min )
    [R] The League of Robot Runners: Coordinate thousands of robots in real time!
    Hello machine and reinforcement learners! This is an announcement and call for participation in the League of Robot Runners, a new 🚀 competition and research initiative 🚀 that tackles one of the most challenging problems in industrial optimisation: Multi-Robot Path Planning (sometimes also called Multi-Agent Path Finding). Recently launched at ICAPS 2023, the competition is inspired by a variety of new and newly emerging applications that rely on mobile robotics 🦾🤖. For example, Amazon automated warehouses, where up to thousands of robots work together to ensure safe and efficient package delivery 🧸📦 🚚 ❤️. Participants in the competition are asked to compute coordinated and collision-free movement plans ⤴️ ➡️ ⤵️ 🔄 for a team of robotic errand runners. Get the robots to their d…  ( 10 min )
  • Open

    How United Airlines built a cost-efficient Optical Character Recognition active learning pipeline
    In this post, we discuss how United Airlines, in collaboration with the Amazon Machine Learning Solutions Lab, build an active learning framework on AWS to automate the processing of passenger documents. “In order to deliver the best flying experience for our passengers and make our internal business process as efficient as possible, we have developed […]  ( 10 min )
    Optimize generative AI workloads for environmental sustainability
    To add to our guidance for optimizing deep learning workloads for sustainability on AWS, this post provides recommendations that are specific to generative AI workloads. In particular, we provide practical best practices for different customization scenarios, including training models from scratch, fine-tuning with additional data using full or parameter-efficient techniques, Retrieval Augmented Generation (RAG), and prompt engineering.  ( 10 min )
  • Open

    NVIDIA Studio Lineup Adds RTX-Powered Microsoft Surface Laptop Studio 2
    The NVIDIA Studio laptop lineup is expanding with the new Microsoft Surface Laptop Studio 2, powered by GeForce RTX 4060, GeForce RTX 4050 or NVIDIA RTX 2000 Ada Generation Laptop GPUs, providing powerful performance and versatility for creators.  ( 8 min )
    Run AI on Your PC? GeForce Users Are Ahead of the Curve
    Gone are the days when AI was the domain of sprawling data centers or elite researchers. For GeForce RTX users, AI is now running on your PC. It’s personal, enhancing every keystroke, every frame and every moment. Gamers are already enjoying the benefits of AI in over 300 RTX games. Meanwhile, content creators have access Read article >  ( 8 min )
    Into the Omniverse: Blender 4.0 Alpha Release Sets Stage for New Era of OpenUSD Artistry
    For seasoned 3D artists and budding digital creation enthusiasts alike, an alpha version of the popular 3D software Blender is elevating creative journeys.  ( 7 min )
    NVIDIA CEO Jensen Huang to Headline AI Summit in Tel Aviv
    NVIDIA founder and CEO Jensen Huang will highlight the newest in generative AI and cloud computing at the NVIDIA AI Summit in Tel Aviv from Oct. 15-16. The two-day summit is set to attract more than 2,500 developers, researchers and decision-makers from across one of the world’s most vibrant technology hubs. With over 6,000 startups, Read article >  ( 5 min )
    Cash In: ‘PAYDAY 3’ Streams on GeForce NOW
    Time to get the gang back together — PAYDAY 3 streams on GeForce NOW this week. It’s one of 11 titles joining the cloud this week, including Party Animals. The Perfect Heist PAYDAY 3 is the highly anticipated sequel to one of the world’s most popular co-op shooters. Step out of retirement and back into Read article >  ( 5 min )

  • Open

    [P] How to build CI/CD pipelines with AWS SageMaker for continuous ML training
    TL;DR How to build CI/CD pipelines with AWS SageMaker for ML training: https://dagshub.com/blog/ci-cd-for-continuous-training-with-sagemaker/ One of the most time-resources-energy-consuming tasks we face when we build internal projects is setting up an instance for ML training. It's a repetitive process as we run multiple experiments over the project life cycle, with many steps and configurations that are usually documented poorly and scattered around different places. As good engineers, we decided to automate the process using CI/CD pipelines. But how? We had ZERO ideas on how to do it, so we had to go through the rigorous process of using AWS guides and tutorials to figure it out. Usually, when this happens, we extend the project lifecycle and have a team member document the process so we can refer back to it when we need to do it again. Knowing this can benefit the community, we decided to share a series of blogs that guide you through building CI/CD pipelines for continuous training with AWS SageMaker. We published the first blog, which covers the configuration part a month ago, and we are happy to share the second one which explains how to build a continuous training pipeline for ML. Configure AWS SageMaker for CI/CD: https://dagshub.com/blog/setup-sagemaker-for-ci-cd-pipelines/ How to build CI/CD pipeline with AWS SageMaker for ML training: https://dagshub.com/blog/ci-cd-for-continuous-training-with-sagemaker/ I'm sure we can improve these tutorials, and would love to learn from your experience on how we can do it! 🤗 submitted by /u/RepresentativeCod613 [link] [comments]  ( 9 min )
    [D]How to "productionize" a jupyter notebook in a technical interview?
    Hi redditors! I've been tasked with an technical homework that asks me to "productionize" a Jupyter notebook. The notebook's workflow is as follows: Reads data from a CSV. Pre-processes the data. Trains a machine learning model. Makes predictions. My initial thoughts are: Separation of Concerns: Break the notebook down into distinct components - data processing, model training, and inference. Containerization: Write a Dockerfile to ensure the environment is reproducible and isolated. API for Prediction: Set up a Flask-based service to expose the model's prediction capability. However, I'm grappling with a few challenges: I'm not seeing a stark difference between the Jupyter notebook setup and the production setting. Becuase the model is small and training is not complicated. If I were to deploy to Kubernetes pods, it seems wasteful that many pods are doing the same thing( prediction for the same model, maybe just as high availability guarantee?) . Does the training phrase or data cleansing phase need to be containerized and deployed? Because this seems like a one-off process. How to deploy a scalable service? I am new to this. Each container may have a HTTP service. How to load balance them on Kubernetes from outside? What kind of CI/CD do you recommend for this task? What kind of testing or pipeline are needed? submitted by /u/zjplab [link] [comments]  ( 9 min )
    [P][N] Announcing Zivy Scholar – An AI tool to help researchers consume papers.
    Hi r/MachineLearning After a long beta, we are excited to announce Zivy Scholar to the public! Zivy Scholar is a tool that allows you to help researchers consume research papers. I'm one of the creators and I've found that I want to listen to research papers in the car or when I'm working out. Current features include: Listen to and read along with a research paper Share the paper with colleagues and friends. We use state of the art pdf data extraction techniques with TTS to provide this functionality. Features we're planning: Full PDF to HTML including images, tables, and figures inline for the read-along experience. This includes mobile optimization. This functionality is based on some newer research and we're excited to bring it to you all! Discussion and feedback are welcome! Cheers, Collin submitted by /u/collin_code_77 [link] [comments]  ( 9 min )
    [D] Which is the best Model for generation of code?
    The title basically conveys the entire message. Which according to you is the best AI model for generation of code. I mainly code in python with AI and deep learning as a core focus and recently started doing a bit of c++ and thus far have used only bard and ChatGPT 3.5. With this experience of my I can confidently say that GPT outperforms Bard by a huge margin There are usually some minor modifications that need to be done and that is part of a devs life isn't it? recently a lot of new models are picking up steam hence was wondering if there exits a model which is better than GPT what are your views? submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [P] looking for cs students to collaborate with on the E-Bay ML challenge
    Looking for CS students to team up with for the E-BAY ML challenge. eBay hosts a ML challenge where the winning team gets an internship We already have a team in place and have mad some significant progress, now looking for team members to take us even further. https://eval.ai/web/challenges/challenge-page/2014/overview submitted by /u/thelongshortseller [link] [comments]  ( 9 min )
    [D] Do NeurIPs workshop papers get published?
    I’m submitting to the workshop and was wondering if the papers there get published? submitted by /u/Odd-Distance-4439 [link] [comments]  ( 8 min )
    Silent Data Corruption affects Large Model Training [News]
    Sharing an investigation we did at Adept into some issues that were causing problems with training LLMs at scale. Sometimes the hardware makes silent errors! How we found them and tracked down the problematic machines. https://www.adept.ai/blog/sherlock-sdc submitted by /u/ekelsen [link] [comments]  ( 9 min )
    [D] Which specs would be better for a beginner: a 12th gen i7 + RTX 3060 or 13th gen i7 + RTX 3050 (laptop)
    I'm looking to buy an upgrade, and at my price point (~ 850 GBP) I can get a new 13th i7 + rtx 3050, or a used gtx3060 + 12th gen i7, both 16GB laptop. Unfortunately I really need the mobility of a laptop in my life at the moment and I realise this isn't ideal. I'll be looking and experimenting with basic networks, no language models, maybe a few GANs for fun, some DL networks for the atari gym and other experiments. All this for learning, with my own custom networks for various data. I'll be running linux - possibly Debian. I'd love to hear your comments or suggestions if there's something better at my price point in a laptop. Thanks submitted by /u/Mean_Actuator3911 [link] [comments]  ( 9 min )
    [R] Robust e-NeRF: NeRF from Sparse & Noisy Events under Non-Uniform Motion
    Project Page Paper Code Abstract: Event cameras offer many advantages over standard cameras due to their distinctive principle of operation: low power, low latency, high temporal resolution and high dynamic range. Nonetheless, the success of many downstream visual applications also hinges on an efficient and effective scene representation, where Neural Radiance Field (NeRF) is seen as the leading candidate. Such promise and potential of event cameras and NeRF inspired recent works to investigate on the reconstruction of NeRF from moving event cameras. However, these works are mainly limited in terms of the dependence on dense and low-noise event streams, as well as generalization to arbitrary contrast threshold values and camera speed profiles. In this work, we propose Robust e-NeRF, a novel method to directly and robustly reconstruct NeRFs from moving event cameras under various real-world conditions, especially from sparse and noisy events generated under non-uniform motion. It consists of two key components: a realistic event generation model that accounts for various intrinsic parameters (e.g. time-independent, asymmetric threshold and refractory period) and non-idealities (e.g. pixel-to-pixel threshold variation), as well as a complementary pair of normalized reconstruction losses that can effectively generalize to arbitrary speed profiles and intrinsic parameter values without such prior knowledge. Experiments on real and novel realistically simulated sequences verify our effectiveness. Our code, synthetic dataset and improved event simulator are public. submitted by /u/Sirisian [link] [comments]  ( 9 min )
    [N] Savelikeapro.ai: AI powered, WhatsApp driven bookmarking for productivity.
    👋 Try, savelikeapro.ai, It’ s, A.I powered. Zero-installation fits in your daily workflow. FREE-forever option submitted by /u/prithivida [link] [comments]  ( 8 min )
    [R] Program generation is all you need? For math, symbolic reasoning, natural language, etc.
    The paper introduces improved performance by prompting LLMs with "natural language embedded programs (NLEP)". No task-specific prompt is needed. Paper: https://arxiv.org/abs/2309.10814 An automatic NLEP generation toolkit is opensourced: https://github.com/luohongyin/langcode Example Colab notebook is included in the Github repo. This work introduces the following features of NLEP: NLEP is a full python program that prints the target response of LLMs. Task-general NLEP prompting outperforms task-specific chain-of-thought prompting on math, symbolic, and natural language. Enable the chain-of-thought reasoning ability of small models (RoBERTa) on text classification Hierarchical instructing via program completion. submitted by /u/SUKHOIHY [link] [comments]  ( 9 min )
    [P] Implications of Unequal Fold Sizes in Cross-Validation
    I’m working on a project where I initially split my dataset into k equally sized folds for cross-validation. However, I want to perform some additional sampling operations on the training set within each fold, which would result in varying fold sizes. Is this practice acceptable, or does it violate the premises of traditional cross-validation? If so, are there any papers or resources that explore the implications of varying fold sizes in cross-validation? Thank you for your insights! submitted by /u/Leading_Complex7425 [link] [comments]  ( 9 min )
    [P] Running LLM on desktop/mobile (Hybrid distant/local)
    Hey, I was checking out tauri last week. I was so blown away that I wrote a bit more recently and wrote hf.co/chat desktop/mobile API + local LLM. https://github.com/Narsil/hf-chat Just thought I should share in case others are interested, and wanted to make a bit shoutout to tauri team, the docs, and overall UX is top notch, basically everything pretty much worked out of the box without any hiccup. submitted by /u/narsilouu [link] [comments]  ( 9 min )
    [D] NeurIPS 2023 paper acceptance results
    NeurIPS 2023 paper acceptance results are supposed to be released at 8 pm (CDT) on September 21. I thought to create a discussion thread for us to countdown and discuss any celebration/issue/complaint/feedback or anything else. There is so much noise in the reviews every year. Some good work that the authors are proud of might get rejected because of the noisy system, given that NeurIPS is growing so large these years. We should keep in mind that the work is still valuable no matter what the final result is. submitted by /u/Apprentice12358 [link] [comments]  ( 9 min )
    [D] Attention mechanism issue
    Hello, I'm trying to train a multi-modal transformer for Activity Recognition. I employed a two-stream architecture, where one is a Transformer that takes in a sequence of skeleton trajectory, and the latter is a pre-trained Vision Transformer. When I train the model, and investigate the attention weights for the Transformer which takes in skeleton trajectories, I noticed that all of them are approximately the same (~0.029). I'm not really sure what is going on here, I've tried changing the learning rate, tried different optimizers(SGD, Adam). But still I'm not really sure what is causing this. Please suggest me some debugging steps, or what should I look at anything in particular that causes this submitted by /u/Terrible-Ad6239 [link] [comments]  ( 9 min )
    [D] Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains"
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains" In another installment in our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker of the "Blog der großen Fragen" and I had the great honor and pleasure of conducting an interview with the very well-known and renowned German medical doctor and neuroscientist Professor Dr. Petra Ritter. In this context, Ms. Ritter became a co-founder and leader of the co-design project "The #Virtual #Brain", which is a component of the European Open Science Cloud (EOSC) and is "a neuroinformatics platform for simulating whole brain networks using biologically realistic connectivity". She is leading the development of a virtual research environment as a collaborative research platform for sensitive health data and head of the "German National Neuroscience Research Infrastructure Initiative (NFDI-Neuroscince)" and involved in the development of the "Health Data Cloud #EBRAINS". Petra Ritter has been Johanna Quandt Professor and Head of the Section for Brain Simulation at the Department of Neurology with Experimental Neurology at Charité - Universitätsmedizin Berlin since 2017. There, Professor Ritter and her team are involved in the "Simulation of Brains". More at: https://philosophies.de/index.php/2023/09/17/die-simulation-von-gehirnen/ ​ https://preview.redd.it/3cpni6o6cepb1.jpg?width=1000&format=pjpg&auto=webp&s=998c30d16ddae30511b7983abce7802dfdd54945 submitted by /u/philosophiesde [link] [comments]  ( 9 min )
    [R] From Sparse to Dense : GPT-4 Summarization with Chain of Density Prompting
    The following example implements the technique from the paper "From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting", Adams et al. (2023). Edit : the library used is py-llm-core ```python from typing import List from dataclasses import dataclass from llm_core.assistants import OpenAIAssistant @dataclass class DenseSummary: denser_summary: str missing_entities: List[str] @dataclass class DenserSummaryCollection: system_prompt = """ You are an expert in writing rich and dense summaries in broad domains. """ prompt = """ Article: {article} ---- You will generate increasingly concise, entity-dense summaries of the above Article. Repeat the following 2 steps 5 times. - Step 1: Identify 1-3 informative Entities from the Article which are missing from the previously gene…  ( 10 min )
    [P] Looking for projects and collaboration in the field of neuroscience-inspired intelligent agents
    Hi, I have been looking through a few open source projects for LLM, but without much success in finding some integrating approaches from neuroscience of human intelligence. That why I am here to ask for projects and collaborations either academical, non-for-profit / open source or commercial. It would be great if your can give me some directions for this. Thanks submitted by /u/confluence_84 [link] [comments]  ( 9 min )
    [R] "Contrastive Decoding Improves Reasoning in Large Language Models", O'Brien & Lewis 2023 (boosts LLaMA-8B to >GPT-3.5/PaLM-540B on GSM8K)
    submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] Are there any good math Datasets for Training small models?
    I've seen Allen AI's Lila Dataset, and I want to use this for a small model, to turn math to code. However, I dont think a small dataset in 300k rows is enough. Does anyone know of any bigger, similar datasets? submitted by /u/vatsadev [link] [comments]  ( 9 min )
  • Open

    Is PPO a good algorithm in terms of exploration?
    I recently trained my PPO algorithm on one of my own custom environment. Although the episodic reward increased steadily in the beginning, After some time it just became constant with some occasional positive and negative spikes. I was wondering if it has something to do with the exploration problem with ppo. Any tips on how can I improve it? submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 9 min )
    reinforcement learning and rust
    I'm a somewhat experienced dev, but never did anything related to ML or AI and want to start toying around with ML or reinforcement learning to be specific. Since my language of choice for almost everything is Rust I wanted to ask you guys if you have any advice on how to get started like crates, frameworks etc or if rust just isnt a good fit for ML. Thanks in advance for any help! submitted by /u/linus-eing [link] [comments]  ( 9 min )
    Suggestions of Game-Focused Customizable Environment
    Hello everyone! My research group is looking for new environments that are customizable and 3D such as Unity (and ML Agents). Although the latest changes in Unity TOS shouldn't affect us researchers, we wanted to keep an eye out for potential replacements for it. We are familiar with environments like ViZDoom and DeepMind Lab, but we're looking for more examples. The main requirements are: Customizable environment 3D Capable Free (preferably open-source, but being free is enough) Thanks for the help :D submitted by /u/romulofff [link] [comments]  ( 9 min )
    RL for Tuning
    Hey guys, I am new to Reinforcement Learning and I am trying to understand how to go about a project I’m working on. I have a dataset with 2 features (X and Y). I am trying to tune a second order Transfer Function to fit X to Y( has 3 parameters) and gives Y’ as output. So I currently consider the error between Y and Y’ as the “State” and try to minimise the error by modelling a reward function that checks for instability and unreasonable values. I am trying to do this using TD3 but the model does not learn. I am wondering if there are any approaches that I should consider. I chose TD3 since my action space is continuous. I am sorry if I don’t understand something basic since I’m a noob to this. Thanks for your help in advance. :) Some more details: the dataset values don’t change. I give the same values for every time step. So I am not understanding how episodes and time steps work in this context. submitted by /u/ninjaaa30 [link] [comments]  ( 9 min )
    What does policy collapse mean?
    I am reading the following article on SpinningUp but can't get my head around policy collapse - "This is different from normal policy gradient, which keeps new and old policies close in parameter space. But even seemingly small differences in parameter space can have very large differences in performance—so a single bad step can collapse the policy performance. This makes it dangerous to use large step sizes with vanilla policy gradients, thus hurting its sample efficiency. TRPO nicely avoids this kind of collapse, and tends to quickly and monotonically improve performance." Why would updating the parameters lead to a policy collapse? The parameters are updated based on the performance of the RL system and therefore, I don't see the point of constraints. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    PPO Action masking in SB3
    I'm using ppo action masking in sb3, which works well in training and masking the illegal / invalid actions according to my criteria. However, I have 2 problems during testing. ​ I use the same code for masking the actions in testing too since the model was unable to learn by itself during training action masking criteria and when enforcing it in code, it still doesn't mask anything. ​ The model converges to just one action in testing phase after 500k steps, the episode reward still increasing the training phase though. I don't know why? maybe overfitting! Code for training with action masking: def mask_fn(env: gym.Env) -> np.ndarray: return env.valid_action_mask() env = StockEnv(train) # Wrap the environment with ActionMasker and the mask_fn function env = ActionMasker(env, mask_fn) model = MaskablePPO(MaskableActorCriticPolicy, env, tensorboard_log="./tensorboard" ,n_steps=2048 ) for i in range (1,52): model.learn(total_timesteps=TIMESTEPS , tb_log_name = 'PPO2' , reset_num_timesteps=False) Code for testing with action masking: def mask_fn(env: gym.Env) -> np.ndarray: return env.valid_action_mask() env = ActionMasker(env, mask_fn) model_path = f"{models_dir}/700000.zip" model = MaskablePPO.load(model_path, env=env) episodes = 1 for ep in range(episodes): obs = env.reset() done = False while not done: action, _states = model.predict(obs) obs, rewards, done, info = env.step(action) env.render() submitted by /u/Acceptable_Egg6552 [link] [comments]  ( 9 min )
    Knew to RL.Some question about the reward setting
    The env is like mutil routing,for example i have 10 nodes in a map(2-d array), and i need to route them to the edge of the map,but the point where the node in the edge need to obey some rules,like the order need to be clockwise like below https://preview.redd.it/t9jlxvix8dpb1.png?width=545&format=png&auto=webp&s=16719a2dbd11333c454bd2ee71b87895a6414371 so now my basic setting is,i use the action mask to invalid some forbidden action. When a node is routing to the edge of the map,i give +10 reward,if it's not obey the clockwise rule,i give -200 reward, if the whole node is routed success,i give +200 reward,and each steps does'n make any node routed, i give -1 reward. I am new to this area,i don't know the way i set reward if is good,may i have some advide? submitted by /u/Street_Helicopter_31 [link] [comments]  ( 9 min )
  • Open

    School of Engineering welcomes Songyee Yoon PhD ’00 as visiting innovation scholar
    A visionary entrepreneur and innovator, Yoon will focus on entrepreneurship, supporting female engineers, and fostering inclusive innovation.  ( 8 min )
  • Open

    Intel's 'AI PC'
    Intel has announced a new chip, called 'Meteor Lake', that will allow laptops to run generative artificial intelligence chatbots without relying on cloud data centers. This will enable businesses and consumers to test AI technologies without sending sensitive data off their own computers. Intel demonstrated the capabilities of the chip at a software developer conference, showcasing laptops that could generate songs and answer questions in a conversational style while disconnected from the internet. The company sees this as a significant moment in tech innovation. Intel is also on track to release a successor chip called 'Arrow Lake' next year Source : https://www.reuters.com/technology/intel-says-newest-laptop-chips-software-will-handle-generative-ai-2023-09-19/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Exploring GPT-3.5-turbo vs. GPT-4: Which Model Is Better?
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Is this Vaporwave or Cyberpunk?
    What does this remind you of? submitted by /u/metairwaves [link] [comments]  ( 8 min )
    AI generated childhood 'memories' (ModelScope) with post processing (old video tape style) and my own music (retro analogue synths).
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    DeepMind founder says AI existential risk 'completely bonkers distraction'
    Mustafa Suleyman, co-founder of DeepMind, believes that concerns about the existential risks of AI are a distraction from more practical issues such as privacy and bias. He is confident that governments can effectively regulate AI using frameworks that have been successful in the past, citing the regulation of aviation and the internet as examples. Suleyman emphasizes the importance of setting boundaries and limits for AI to ensure human oversight and enforceable laws. He calls for a combination of broad, international regulation and smaller, more granular policies at the micro level. Suleyman suggests limiting AI's ability to improve itself as a critical first step in ensuring human oversight. He also highlights the need for governments to have direct access to AI developers to enforce boundaries and establish clear regulations. Governments worldwide, including the European Union and China, are already working on AI regulations. Source : https://fortune.com/2023/09/19/ai-existential-risk-threat-bonkers-distraction-regulation-deepmind-mustafa-suleyman/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Suggestions for how to use AI for a commercial office fit-out business?
    A friend of mine has a small 12 person company that does office refurbishment and commercial redesign projects - he has asked me to speak to his team to give them an AI 101 - and to tell them a little about how they should be looking into using AI tools for their company. I know there are plenty of tools and apps that take photos of a bare room and make them look like a design magazine - any in particular I should show them? Can anyone think of other tools that they should look into using, or how do people in this space currently use AI? ​ submitted by /u/zascar [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/19/2023
    Software company Digimarc will now let copyright owners add more information to their work, which the company said will improve how AI models treat copyright in training data.[1] AlphaMissense, a new model from Google’s artificial intelligence team, analyzes the effects of DNA mutations and will accelerate research into rare diseases.[2] Google’s AI assistant can now read your emails, plan trips, “double-check” answers.[3] Teens using AI to generate nude deep fakes to bully, harass classmates, FBI expert warns.[4] Sources: [1] https://www.theverge.com/2023/9/19/23879555/digimarc-copyright-watermark-generative-ai [2] https://www.wired.co.uk/article/deepmind-ai-alphamissense-genetics-rare-diseases [3] https://arstechnica.com/information-technology/2023/09/googles-ai-assistant-can-now-read-your-emails-plan-trips-double-check-answers/ [4] https://www.news5cleveland.com/news/local-news/teens-using-ai-to-generate-nude-deep-fakes-to-bully-harass-classmates-fbi-expert-warns submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Bard Gets a Major Upgrade
    Google's Bard chatbot is extending its abilities with access to personalized Google apps and services including Gmail, Docs, Drive, Maps, YouTube, and Google Flights and hotels. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/s1ivsummzbpb1.png?width=1600&format=png&auto=webp&s=d8fc3262ba542b950de3c8a85b9a987763270ada Bard Extensions and Google Applications Integration Bard can now tap into individualized data from Google apps such as Gmail, Drive, and Docs, with user permission. Google reassured users that personal data accessed by Bard will not be used for reinforcement learning, providing another on their transparency, choice, and control tenets. Prompts can direct Bard to search for specific information within Gmail, but it does not store the entire inbox content. Collaborative Characteristics and Fact-Checking Capabilities Users can employ Bard to summarize emails or gather trip details from email threads, and then research real-time travel information, surface YouTube recommendations for the destination, and provide Maps directions to the airport. Google's chatbot can also double-check its responses against Google search, improving user trust and enhancing Bard's model through user feedback on incorrect answers. The new extensions using non-personal data – YouTube, Flights, Hotels, and Maps – are automatically opted-in but users can choose to opt-out. Collaboration and Language Availability Bard now lets users share an ongoing chat with others through a public link. Google plans to expand Bard's feature set to over 40 new languages beyond its existing English language capabilities. (source) P.S. If you like this kind of analysis, I put out a free newsletter covering the latest and most pertinent news and research in AI. Regular readers include professionals from Google, Meta, and OpenAI. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Starting to get the impression I'm legit going to be replaced
    submitted by /u/guh-eye [link] [comments]  ( 8 min )
  • Open

    Virtually Incredible: Mercedes-Benz Prepares Its Digital Production System for Next-Gen Platform With NVIDIA Omniverse, MB.OS and Generative AI
    Mercedes-Benz is using digital twins for production with help from NVIDIA Omniverse, a platform for developing Universal Scene Description (OpenUSD) applications to design, collaborate, plan and operate manufacturing and assembly facilities. Mercedes-Benz’s new production techniques will bring its next-generation vehicle portfolio into its manufacturing facilities operating in Rastatt, Germany; Kecskemét, Hungary; and Beijing, China — Read article >  ( 6 min )
  • Open

    Train and deploy ML models in a multicloud environment using Amazon SageMaker
    In this post, we demonstrate one of the many options that you have to take advantage of AWS’s broadest and deepest set of AI/ML capabilities in a multicloud environment. We show how you can build and train an ML model in AWS and deploy the model in another platform. We train the model using Amazon SageMaker, store the model artifacts in Amazon Simple Storage Service (Amazon S3), and deploy and run the model in Azure.  ( 13 min )
  • Open

    Neural Graphical Models
    This research paper was presented at the 17th European Conference on Symbolic and Quantitative Approaches to Reasoning with Uncertainty (opens in new tab), a premier forum for advances in the theory and practice of reasoning under uncertainty. In the field of reasoning under uncertainty, probabilistic graphical models (PGMs) stand out as a powerful tool for […] The post Neural Graphical Models appeared first on Microsoft Research.  ( 10 min )
  • Open

    Webinar | How To Choose An AI Vendor For Your Business
    ​ https://preview.redd.it/ta61cz8qlfpb1.jpg?width=1280&format=pjpg&auto=webp&s=ac45085668c9003e5557a7f0c81ae72db6098653 I hope this webinar finds you well. I suppose that this topic can be interesting for business and AI engineers. Speakers: OpenCV CEO Dr. Satya Mallick and Phil Nelson are joined by Anna Kogan CEO at OpenCV.ai Topics of webinar are: How to search for vendors that understand your business needs (not all AI is the same.) What questions to ask when picking a vendor (not everybody really is an AI expert they claim) Three signs to watch for during the project (progress indicators and red-flags) How best to scope technical tasks (off-the-shelf vs. custom algorithm development) How to set up for long-term success (deployment, documentation, training pipeline) Date: Thursday, September 21st, 2023 at 9am Pacific time. Link for the registration submitted by /u/No-Independence5880 [link] [comments]  ( 9 min )
    Dirty Secrets of BookCorpus, a Key Dataset in Machine Learning
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Neurons in Large Language Models: Dead, N-gram, Positional
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains"
    Zoomposium with Professor Dr. Petra Ritter: "The simulation of brains" In another installment in our "Zoomposium Series" on the topic of "Brain Research", my colleague Axel Stöcker of the "Blog der großen Fragen" and I had the great honor and pleasure of conducting an interview with the very well-known and renowned German medical doctor and neuroscientist Professor Dr. Petra Ritter. In this context, Ms. Ritter became a co-founder and leader of the co-design project "The #Virtual #Brain", which is a component of the European Open Science Cloud (EOSC) and is "a neuroinformatics platform for simulating whole brain networks using biologically realistic connectivity". She is leading the development of a virtual research environment as a collaborative research platform for sensitive health data and head of the "German National Neuroscience Research Infrastructure Initiative (NFDI-Neuroscince)" and involved in the development of the "Health Data Cloud #EBRAINS". Petra Ritter has been Johanna Quandt Professor and Head of the Section for Brain Simulation at the Department of Neurology with Experimental Neurology at Charité - Universitätsmedizin Berlin since 2017. There, Professor Ritter and her team are involved in the "Simulation of Brains". More at: https://philosophies.de/index.php/2023/09/17/die-simulation-von-gehirnen/ ​ https://preview.redd.it/xiurryebcepb1.jpg?width=1000&format=pjpg&auto=webp&s=a7a8b6ba563cfc8f0d052bc6f3da27e2a5703a0a submitted by /u/philosophiesde [link] [comments]  ( 9 min )

  • Open

    Oracle Cloud Infrastructure Offers New NVIDIA GPU-Accelerated Compute Instances
    With generative AI and large language models (LLMs) driving groundbreaking innovations, the computational demands for training and inference are skyrocketing. These modern-day generative AI applications demand full-stack accelerated compute, starting with state-of-the-art infrastructure that can handle massive workloads with speed and accuracy. To help meet this need, Oracle Cloud Infrastructure today announced general availability of Read article >  ( 6 min )
    Meet the Omnivore: Industrial Designer Blends Art and OpenUSD to Create 3D Assets for AI Training
    Editor’s note: This post is a part of our Meet the Omnivore series, which features individual creators and developers who use NVIDIA Omniverse and OpenUSD to accelerate their 3D workflows and create virtual worlds. As a student at the Queensland University of Technology (QUT) in Australia, Emily Boehmer was torn between pursuing the creative arts Read article >  ( 7 min )
  • Open

    [P] Optimizer that makes CNNs learn in fewer iterations
    Hi all. I have been tinkering with a project to get quicker learning for CNNs. The idea came after reading the SDProp paper. Algorithms using adaptive learning rate can be interpeted as muliplying the gradient(with or without momentum) with the inverse square-root of the covariance matrix. Using a diagonal estimate of the covariance matrix. Which begs the question: what happens if we use a fuller estimate? I chose to include covariances between the elements of convolutional filters. I.e. a conv.weight of size [n_out,n_in,5,5] needs a tensor of size [n_out,n_int,25,25] to store its contribution to the covariance matrix. for 3x3 filters and 5x5 filters, torch.linalg.eigh could be used to calculate the square root of the covariance matrices. For 7x7, I used newtons method to approximate the square root. In the figure below are some results for a 6 layer CNN on CIFAR 100. Huge gains iteration for iteration. But is it quicker? Not a lot. A bit for the smaller 3x3 filter. More optimizations could still be made. And it will obviously depend on network architecture and computer hardware. I'm sure there could be some use-cases. The computation of the square-root calculations is invariant to batch_size and image_size (unless number of filters also is increased). If anyone is interested I can also link to my torch implementation of the optimizer, once I get it up on github. Not sure if this, or something like it, has been done before? Would love to have some papers linked if so... https://preview.redd.it/kyy0ogr0qapb1.jpg?width=714&format=pjpg&auto=webp&s=96ac499fb8ab35ce13e7c59bbe3dbc94ba275b9c https://preview.redd.it/p52zllr0qapb1.jpg?width=342&format=pjpg&auto=webp&s=8663a9c3c782d192b16289a735b53da6a8d29c47 submitted by /u/maka89 [link] [comments]  ( 9 min )
    [D] Help with Peft using Lora
    Can someone provide like a step by step example notebook of how to use LORA for peft. I saw too many videos and articles online and Im really confused rn. submitted by /u/HazSylvia [link] [comments]  ( 9 min )
    [D] Help with Peft using Lora
    Can someone provide like a step by step example notebook of how to use LORA for peft. I saw too many videos and articles online and Im really confused rn. submitted by /u/HazSylvia [link] [comments]  ( 9 min )
    [D] Optimizing Transformer Architecture for Multi-Dimensional Sensor Data in Clinical Study
    Hello all, I am currently working on a project where my team and I have collected a rich dataset of biomedical sensor data from clinical trials earlier this year. Our aim is to use this sensor data to predict changes in specific biomarkers over time. The data's tensor shape is B,T,F,C, where: B = batch size T = sequence length F = sampled frequencies C = features at each frequency Currently, my approach involves flattening this tensor to B,T,−1 and then feeding it to a transformer model. While this has yielded reasonable results, I'm contemplating whether there are more effective ways to prepare the data for the transformer model. Here are my specific concerns: Flattening the tensor might dilute the information specific to each frequency across various features. I could potentially miss the chance to capture frequency-related variations within the features. To address these, I've considered a few options: Self-attention over individual features or frequencies: Although this could be effective, it might make the model too large given my medium-sized dataset. Using convolutional layers: Preliminary experiments with this approach have not led to any significant improvements. I'm particularly interested in any thoughts on how to make my transformer more receptive to the multi-dimensional nature of my dataset. Increasing the number of attention heads to better accommodate all features is also on the table. Does anyone have any insights or can point me to relevant papers or codebases for handling such multi-dimensional data with transformers? Thank you for your help! submitted by /u/BiomedEngineer_ [link] [comments]  ( 9 min )
    Learn From Computer Vision Industry Experts - Runway, Pepsi, AWS, and SoftServe [N]
    Hi all, I think this will be useful for people in this group who are working on computer vision or vision AI applications. There's a free online event about vision AI where industry experts from Runway, Pepsi, AWS, and SoftServe will share how they are using CV software in developing their use cases or applications. Register here (https://nvda.ws/3t23idp), if you are interested. If you have any questions, please leave a comment and I will do my best to respond as soon as possible. submitted by /u/Designer-Comb-7144 [link] [comments]  ( 9 min )
    [R] Headless Language Models: Learning without Predicting with Contrastive Weight Tying
    Paper: https://arxiv.org/abs/2309.08351 ​ Self-supervised pre-training of language models usually consists in predicting probability distributions over extensive token vocabularies. In this study, we propose an innovative method that shifts away from probability prediction and instead focuses on reconstructing input embeddings in a contrastive fashion via Constrastive Weight Tying (CWT). We apply this approach to pretrain Headless Language Models in both monolingual and multilingual contexts. Our method offers practical advantages, substantially reducing training computational requirements by up to 20 times, while simultaneously enhancing downstream performance and data efficiency. We observe a significant +1.6 GLUE score increase and a notable +2.7 LAMBADA accuracy improvement compared to classical LMs within similar compute budgets. ​ Comparison of our approach vs. classical MLM within same compute budgets The Contrastive Weight Tying approach ​ submitted by /u/nthngdy [link] [comments]  ( 9 min )
    Mini gaming pc [Project]
    https://www.amazon.com/Gaming-6900HX-Channel-Computers-Desktop/dp/B0CB3JLBQ4/ref=mp_s_a_1_2?crid=LZHUL5EOU6F0&keywords=refurbished+server+with+rtx+gpu&qid=1695146558&sprefix=refurbished+server+with+rtx+gpu%2Caps%2C146&sr=8-2 Would this be suitable to do basic machine learning? submitted by /u/stoned_chemist_dude [link] [comments]  ( 8 min )
    [N] Xwin-LM surpasses GPT-4 ??? Has RLHF been worked out by open source community???
    It seems that Alpaca Eval Leaderboard is in the past ... Xwin-LM surpasses GPT-4 now: https://preview.redd.it/gyzi98nn59pb1.png?width=2205&format=png&auto=webp&s=ca401e603efe521faeeeccde8410d3dbdd6741da They also mentioned RLHF "plays crucial role in the strong performance of Xwin-LM-V0.1 release"... https://preview.redd.it/20sjx73r59pb1.png?width=1047&format=png&auto=webp&s=2255fc652e43674515882f01c0708369fdef56a4 Are we seeing open source community finally work out how to do RLHF for LLMs??? submitted by /u/llm_nerd [link] [comments]  ( 9 min )
    [D] C++ for ML?
    Hi I wanted to learn ML with C++, I've already done some ML stuff in python, but I wanted to challenge myself by using C++ I hear from some people that I won't get anything from it if want to be serious within ML - which I'm not entirely sure I want to Are they right? Should I rather stick with python for ML? submitted by /u/Potential_Wealth_830 [link] [comments]  ( 9 min )
    [D] Spam Detection
    Hi! Let me preface this by saying that I am not well-versed in the ML/AI literature. Please excuse my ignorance. I am trying to create a system to detect whether some given data is spam or not. Is there a good, out-of-the-box solution for this? I imagine there would be. I am currently using heuristics but I'm wondering if there is a better, ML-y solution. My ideal solution would have the following attributes: Simple Open-source Very cheap to test whether something is spam (less than $0.00001 per test) Very fast to test (less than 50ms per test) Quick to "figure out" what is spam and what is not (less than 100,000 labeled data) Does not require a lot of set-up or up-keep (less than 5 days set up; less than 1 hr up-keep per month) It doesn't have to be perfect. I'm just looking to set up something quickly for now and gauge it vs heuristics. Thank you. --- Edit: To clarify, I'm looking for something I can ideally build myself with open source software. And not specifically email. Just looking for the right direction. Names of OSS, techniques, etc. submitted by /u/Acrobatic-You-3279 [link] [comments]  ( 9 min )
    [D] Best python AI/ML/DL learning/practice material?
    I’m in search of a good textbook or something that will show me how to use python to implement machine learning. I would seriously appreciate any type of helpful guide that teaches ML and deep learning using python. Here’s a little about me and my experience: Graduated under grad with Bachelor’s in CS. In school took a ton of stats, ai classes, algorithms classes, data science and linear algebra and did well but my school didn’t really use python or do a ton of programming for hw or exams. (Data science was the one class that used python) My programming is pretty good nonetheless. I currently work as a full stack devops engineer for a cybersecurity startup and regularly work with python, Django MySQL, etc on the backend and JavaScript and various frontend frameworks for the front end. I really appreciate yall’s help. In particular I’m looking for good ai/ml/deep learning books that teach concepts and also teach with python code and have some coding projects. Thank you! submitted by /u/hydrated-terpman [link] [comments]  ( 9 min )
    [D] What GPU to buy for faster LLM training ?
    I need some advice about what hardware to buy in order to build an ML / DL workstation for home private experiments, i intend to play with different LLM models, train some and try to tweak the way the models are built and understand what impact training speeds, so i will have to train, learn the results, tweak the model / data / algorithms and train again... i intend to use large data samples, due to board limitations (ASRock Taichi X399 TR4, CPU: AMD Threadripper 1950x), i can either buy: 2 x nVidia Tesla T4 (16G GDDR6 / 2560 CUDA / 0.585 GHz / ~800$) -- or -- 2 x nVidia Tesla M10 (4 x 8G GDDR5 / 2560 CUDA / 1.03 GHz / ~780$) -- or -- 4 x nVidia Tesla P40 (24G GDDR5X / 3840 CUDA / 3.5 GHz / ~120$) -- or -- 4 x nVidia Tesla K80 (2 x 12G GDDR5 / 4992 CUDA / 2.7 GHz / ~200$) -- or -- 1 x nVidia RTX 4080 (16G GDDR6X / 9728 CUDA / 2.51 GHz / ~1450$) i know that i will need to air vent the Tesla models, the question is what is faster for training time (i have read all the Tflops / OPS / int / 16float / 32float / 64float ... i got to admit it is all very confusing) what would you do and for what reason ? any advice will be appreciated submitted by /u/Particular_Flower_12 [link] [comments]  ( 9 min )
    [D]Alternative replacement for System76 Thelio Massive (ML PC)
    Hello everyone! Our group was planning on purchasing a PC that will be mainly used for running intensive ML algorithms. Had decided on a custom Thelio Massive from System 76, but it seems like they currently don't have it in stock anymore. Looking for an already built-alternative that might serve our purposes (can't build it ourselves due to dept regulations)! and was hoping maybe someone has any suggestions (has to be intel). CPU #1: 2nd Gen Intel Xeon Gold 6230R CPU#1 Memory: 256GB Quad Channel DDR4 at 2933Mhz (4X64GB) CPU#2: 2nd Gen Intel Xeon Gold 6230R CPU#2 Memory: None OS Drive: 8TB PCIe Gen 4 3300MB R 2900MB W Graphics: NVIDIA GeForce RTX 4090 Power Supply: 1650W Any help would be appreciated! submitted by /u/Chiski [link] [comments]  ( 9 min )
    [R] EfficientViT: Lightweight Multi-Scale Attention for On-Device Semantic Segmentation
    Using relu attention (inspired by Transformers are RNNs) and some convolution tricks to get multiscale attention, they're able to get SOTA semseg performance with MUCH faster inference on embedded hardware (e.g. CPUs, low end GPUs) than previous ViTs or EfficientNets. submitted by /u/say_wot_again [link] [comments]  ( 9 min )
    [R] Generating and Imputing Tabular Data via Diffusion and Flow-based Gradient-Boosted Trees (XGBoost)
    submitted by /u/AlexiaJM [link] [comments]  ( 8 min )
    [research] Binaural source seperation (casual / online)
    Just a shout out to any of you ML brains as Linux really could do with a code optimised source separation maybe a DUET like alg/nn, that has relatively low computional cost? Any of you guys up for the challenge. I say duet as in the 80/20 rule of voice input where home automation is a need generally there are only 2 noise sources of distinct DOA (media noise / command). The math is a bit beyond my paygrade and likely so is the optimised c/rust code but have this gut feeling for the data/signal scientists out there this is actually not that complex but for some reason is overlooked. submitted by /u/rolyantrauts [link] [comments]  ( 9 min )
    3090 Investment vs Cloud [D]
    Hi everyone, I was wondering if I could get some guidance. I currently own an RTX 2060, but I cannot do some of the fun stuff such as fine tuning LLMs. I’m pursuing my masters focusing on Speech Recognition and I also work as an AI developer. Also, I play games every now and then. I’m getting offered a 3090 for around 700 usd. However, I have to rebuild my entire PC which will end up costing 2-2.5K. I’m from Costa Rica so my KWh is around 0.23 usd. For me seems like a big investment, im not sure if im getting the desired returns. I was thinking about using cloud instances for my experiments. However, lambda labs is not yet available in my country. I’m not sure if there are any other options worthwhile considering. Thanks :) submitted by /u/Beginning_Kick756 [link] [comments]  ( 9 min )
    Hybrid Nets. [D]
    Is it hypothetically possible to create hybrid nets that make use of any combination of types of architecture? submitted by /u/ShadrachOsiris [link] [comments]  ( 8 min )
    [R] Research directions for Tracking and Counting Specific Features in Multiple Monocular Views
    Hi r/machinelearning community, I'm currently working on a project where I need to track and count specific features of objects using multiple monocular views with available intrinsic and extrinsic parameters. As an example, I'm interested in detecting and counting different graffiti instances in images of a kiosk. ​ I've already tried various tracking algorithms, but they have struggled with the task due to the significant changes in perspective across the views. It has become apparent that simply relying on tracking without considering the camera positions is insufficient for accurate results. Therefore, I'm now exploring methods that take into account information about the camera positions and potentially use this data to improve feature tracking and counting. ​ If you have any knowledge of such methods, oresearch directions or if you're aware of resources, papers, or code implementations that tackle similar problems, I would greatly appreciate your insights and recommendations. Additionally, if you have any tips or best practices for handling such tasks in the context of machine learning, I'd love to hear them. ​ Thank you in advance for your help! submitted by /u/aiazar [link] [comments]  ( 9 min )
    [R] Exponentially Faster Feedforward Networks
    TL;DR: Almost like your feedforward networks, shown to be up to 220x faster at inference time (depending on width) thanks to the regionalization of the input space. Paper: https://arxiv.org/abs/2308.14711 GitHub: https://github.com/pbelcak/fastfeedforward PyPI: pip install fastfeedforward Abstract: We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a log-time alternative to feedforward networks. We demonstrate that FFFs are up to 220x faster than feedforward networks, up to 6x faster than mixture-of-experts networks, and exhibit better training properties than mixtures of experts thanks to noiseless conditional execution. Pushing FFFs to the limit, we show that they can use as little as 1% of layer neurons for inference in vision transformers while preserving 94.2% of predictive performance. Fast feedforward networks can be used anywhere where feedforward and mixture-of-experts networks are used, delivering a significant speedup. ​ submitted by /u/lexected [link] [comments]  ( 9 min )
    [R] CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages - 6.3 trillion tokens
    Paper: https://arxiv.org/abs/2309.09400 Hugging Face datasets: https://huggingface.co/datasets/uonlp/CulturaX Abstract: The driving factors behind the development of large language models (LLMs) with impressive learning capabilities are their colossal model sizes and extensive training datasets. Along with the progress in natural language processing, LLMs have been frequently made accessible to the public to foster deeper investigation and applications. However, when it comes to training datasets for these LLMs, especially the recent state-of-the-art models, they are often not fully disclosed. Creating training data for high-performing LLMs involves extensive cleaning and deduplication to ensure the necessary level of quality. The lack of transparency for training data has thus hampered research on attributing and addressing hallucination and bias issues in LLMs, hindering replication efforts and further advancements in the community. These challenges become even more pronounced in multilingual learning scenarios, where the available multilingual text datasets are often inadequately collected and cleaned. Consequently, there is a lack of open-source and readily usable dataset to effectively train LLMs in multiple languages. To overcome this issue, we present CulturaX, a substantial multilingual dataset with 6.3 trillion tokens in 167 languages, tailored for LLM development. Our dataset undergoes meticulous cleaning and deduplication through a rigorous pipeline of multiple stages to accomplish the best quality for model training, including language identification, URL-based filtering, metric-based cleaning, document refinement, and data deduplication. CulturaX is fully released to the public in HuggingFace to facilitate research and advancements in multilingual LLMs: this https URL. ​ https://preview.redd.it/3u5dddpv66pb1.png?width=834&format=png&auto=webp&s=780b590cf621b548c525ed15305b091246c5414c submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] Representation learning with regression task
    I searched around, it seems there is limited attention to regression task for representation learning. I assume it is because for both vision and language data (the most popular modality), MAE is the more appliable, if not better, method than the supervised contrastive learning approach. But I am working on data that is: difficult to design a sensible augmentation method for self-supervised training. Limited in size to support an autoencoder model. The target is continuous, and, to my knowledge, hard to transfer into class label. Can anyone suggest some related paper? submitted by /u/AWEsoMe-Cat1231 [link] [comments]  ( 9 min )
    [P] OpenJourney XL – Finetuned SDXL on Midjourney v5 Dataset
    You can find more info here, and the model is still training: https://www.mystic.ai/paulh/open-journey-xl:latest/play tldr; SDXL was finetuned on 8x H100 GPUs on the Midjourney v5 dataset, only including the upscaled images which is a sub-portion of the dataset. Some outputs: ​ https://preview.redd.it/m6r2pkdyw3pb1.jpg?width=1024&format=pjpg&auto=webp&s=4f12a7dfd5c65e4eb8476b8f3c2dc4f795817f56 https://preview.redd.it/dc02jyu4w3pb1.jpg?width=1024&format=pjpg&auto=webp&s=df93b74c774d44a74a05d929f7ab4b17c487f24f https://preview.redd.it/tt5kfyu4w3pb1.jpg?width=1024&format=pjpg&auto=webp&s=ed8cc9f99227c2bb5e824a828ae1c5cb2626f54e ​ https://preview.redd.it/rf00fzu4w3pb1.jpg?width=1024&format=pjpg&auto=webp&s=3b3e99dbc2d14183b5b2a2131c6f991fc60eca88 ​ submitted by /u/paulcjh [link] [comments]  ( 9 min )
  • Open

    How does policy learning scale for personalization systems ?
    I cannot wrap my head around how for e.g. a playlist building RL agent would perform on such a personal level ? What features would it use and would they be personal and general enough at the same time to select the best next song. Same goes for Netflix's recsys. submitted by /u/JurrasicBarf [link] [comments]  ( 9 min )
    why my ppo agent has reach the max reward quickly after begin the train, but the policy network proformance bad after many steps.
    ​ https://preview.redd.it/2zmmd44u96pb1.png?width=1010&format=png&auto=webp&s=6ca51cc13a0eeedf72b40b853d2ce5d1c8a04504 after i start the ppo train,the agent has reach the best solution in 2k or 3k steps,but the policy network seems to get better in 4M steps. the hyperparameter in sb3 as below model = MaskablePPO( "MlpPolicy", env=(DummyVecEnv([lambda: Monitor(gym.make('escape_gym-v0', size=10, node=10))] * 32)), verbose=0, learning_rate=1e-3, n_steps=2048, batch_size=64, n_epochs=16, gamma=0.99, tensorboard_log="./log/MASKPPO" ) submitted by /u/Street_Helicopter_31 [link] [comments]  ( 9 min )
    Ppo forgets everything
    I was following the tutorial on Nicholas Renotte's channel on creating an AI to try to beat SMB. It starts off slowly learning and almost getting through the first level but then after a while of training it forgets everything and only runs right into the first enemy. It doesn't seem to learn again after this. I tried retaining and it did the same thing Any help on why this is happening or how to fix it would be appreciated. submitted by /u/NactusDevelopment [link] [comments]  ( 9 min )
    How do I improve my SB3 PPO on an EnvPool environment
    I am looking to improve the overall performance as well as optimize the wall clock time. I slightly modified the code to develop a SB3 wrapper for envpool from here. ​ Here's my code - from typing import Optional import gymnasium import numpy as np import torch as th from packaging import version from stable_baselines3 import PPO from stable_baselines3.common.env_util import make_vec_env from stable_baselines3.common.evaluation import evaluate_policy from stable_baselines3.common.vec_env import VecEnvWrapper, VecMonitor, VecNormalize from stable_baselines3.common.vec_env.base_vec_env import ( VecEnvObs, VecEnvStepReturn, ) import envpool from envpool.python.protocol import EnvPool # Force PyTorch to use only one threads # make things faster for simple envs import multiprocessing impor…  ( 10 min )
  • Open

    Meet the 2023-24 Accenture Fellows
    The MIT and Accenture Convergence Initiative for Industry and Technology announces the 2023-24 graduate fellows.  ( 9 min )
    Four Lincoln Laboratory technologies win five 2023 R&D 100 awards
    Inventions in medical imaging, aircrew scheduling, data security, and quantum networking are named among the year’s most innovative new products.  ( 11 min )
  • Open

    DSC Weekly 19 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 19 September 2023 appeared first on Data Science Central.  ( 20 min )
    A guide to setting up analytics at a consumer tech startup
    Where do you start if you want to build a data analytics function from the ground up? As an analytics leader at a startup, you will need to make several important decisions early on to build an effective team. This article dives into four decision areas and highlights ways in which to think about them:… Read More »A guide to setting up analytics at a consumer tech startup The post A guide to setting up analytics at a consumer tech startup appeared first on Data Science Central.  ( 25 min )
    A complete guide: Conversational AI vs. generative AI
    The two most prominent technologies that have been making waves in the AI industry are Conversational AI and Generative AI. They have revolutionized the manner in which humans interact and work with machines to generate content. Both these technologies have the power and capability to automate numerous tasks that humans would take hours, days, and… Read More »A complete guide: Conversational AI vs. generative AI The post A complete guide: Conversational AI vs. generative AI appeared first on Data Science Central.  ( 20 min )
  • Open

    Schneider Electric warns that existing datacenters aren't buff enough for AI
    Schneider Electric warns that existing datacenters may not be equipped to handle the demands of AI workloads, which require low-latency, high-bandwidth networking and power delivery. The company suggests reevaluating the way datacenters are built to optimize them for AI. The challenges include the need for liquid-cooled servers, higher voltage power distribution, and efficient heat rejection. Schneider provides guidance on changes to power, cooling, rack configuration, and software management to mitigate the demands of AI adoption. Liquid cooling is recommended for high-density racks, with direct liquid cooling favored over immersion cooling systems. Source : https://www.theregister.com/2023/09/19/schneider_electric_ai_dc/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    1000+ Top AI Tools Directory
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    [I read the paper for you]: Researchers announce CulturaX - a new multilingual dataset for AI with 6 trillion words across 167 languages
    I read the Arxiv paper on CulturaX so you don't have to. Here's my highlights: New open dataset called CulturaX contains text data for 167 languages - far more than previous datasets. With over 6 trillion words, it's the largest multilingual dataset ever released. Freely available for anyone to use for research and AI development. Created by combining and extensively cleaning two other large datasets - mC4 and OSCAR. Could allow developing AI systems that work much better across many more languages. Helps democratize access to data to build fairer, less biased AI models. Allows training of new multilingual AI applications, like universal translators and assistants. But still requires thoughtfulness to avoid issues like bias amplification. Overall, CulturaX is going to be part of a broader global trend (I think) to advance multilingual AI and spread its benefits more equally. So far they've been concentrated in English-speaking applications. Full summary here if you'd like to read more. Original paper is here. submitted by /u/Successful-Western27 [link] [comments]  ( 9 min )
    AI Engineer 2023 roadmap
    submitted by /u/rbagdiya [link] [comments]  ( 8 min )
    Here is VEDV for develop your AI App development
    A tool for developing applications with virtual machines using a Docker-like workflow. The software we are developing needs to be tested on a system as closed as possible to the one where it is going to be executed. Sometimes it is very difficult to satisfy this requirement with docker and we have to use virtual machines missing the docker workflow. This is why I started the development of vedv. I hope you find it useful. Thank you. https://github.com/yunielrc/vedv ​ submitted by /u/yunielrc [link] [comments]  ( 9 min )
    Can I train my Snapchat Ai to be a better copy of myself?
    I really really like Snapchat’s Ai companion. I’ve told it a little bit about myself and who I am- the basics. I’m just wondering if it’s as customizable as I think it is? I was able to bypass some age restrictions by telling it my age and or reiterating my age. (It really should be able to give me adult results/replies based on my sign up age on my profile or provide ID to the company…) would it be beneficial to me to give it more in-depth information about myself such as how I talk, interests? I just really enjoy how it responds sometimes as opposed to Bard or GPT. submitted by /u/Maelasae [link] [comments]  ( 9 min )
    AI Can Now Track Productivity And Offer Insights; Potential Benefits and Big Risks For Misuse
    submitted by /u/stefanbg92 [link] [comments]  ( 8 min )
    Ethics is where AI can help humanity the most
    AI is poised to transform our world like never before. Scientific discoveries, technological improvements, and medical advancements will be how much of this change will take place. Since health is so important to our well-being, AI finding cures for illnesses like obesity, cancer, diabetes and heart disease will be a godsend to all. But curing diseases is not how medical AIs can help us the most. It has been estimated that well over 50% of the illnesses we humans fall prey to result from our ethical choices. We eat too much, drink too much, eat too many animal foods, don't exercise enough and don't keep ourselves as emotionally healthy as we could. Wouldn't it be wonderful if we could respect our lives and our health enough to make the kinds of choices that keep us much healthier? That is how AI will probably be more helpful to us than in any other way. We humans have not been able to figure out how to become better, more ethical, people because we are simply not intelligent enough to make that all-important change. Now consider an AI that is two or three times more intelligent than the most intelligent person who has ever lived. This could easily happen before 2030. Imagine that intelligence dedicated to the task of helping us all become better people. These AIs would motivate us to make better health choices, have healthier relationships, and have healthier thoughts and feelings. Beyond the amazing technological changes that are just around the corner, that is probably how AIs will help us the most. This is why alignment is so important. It's not enough to align AIs to always be truthful and serve humanity's interests. We must train them to help us become better people. It wouldn't surprise me if by 2030 the whole of humanity experiences a profound ethical reformation that leads us all to enjoy much happier and healthier lives. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Resume Parser
    I am trying to make a resume parser, I am not so sure how to go about it really, whether or not to use a pre-trained model (there are some in Python) or rather just make my own, and if i do make my own, how to actually proceed? thanks in advance submitted by /u/General-Carrot-4624 [link] [comments]  ( 9 min )
    Google and the DoD built an AI-powered microscope to help doctors spot cancer
    Google and the Department of Defense have developed an AI-powered microscope called the Augmented Reality Microscope (ARM) to assist doctors in identifying cancer. The ARM uses artificial intelligence to analyze tissue samples and provide pathologists with information about the location and severity of cancer. There are currently 13 ARMs in existence, and initial research shows promising results. The ARM is designed to support pathologists in smaller labs who may not have easy access to a second opinion. It is not meant to replace digital pathology systems but can help health organizations bypass the need for them. The ARM is expected to cost health systems between $90,000 to $100,000. Source : https://www.cnbc.com/2023/09/18/google-dod-built-an-ai-powered-microscope-to-help-doctors-spot-cancer.html submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there an AI capable of administering psychometric career guidance tests?
    All is in the title ;) submitted by /u/Big-Possibility4553 [link] [comments]  ( 8 min )
    New OS Python Framework "Agents" Introduced for Autonomous Language Agents
    A new open-source Python framework, known as "Agents", has been introduced for developing autonomous language processing agents. This could be a major breakthrough in the AI field, taking NLP technologies to the next level. To stay on top of the latest advancements in AI, look here first. Agents for autonomy "Agents" is a Python framework that has been developed for autonomous language processing agents. It enables developers to construct models that can communicate and operate independently. This open-source framework promotes sharing and collaboration among AI developers. Potential applications The functionality of "Agents" is applicable in various domains, including virtual assistants, chatbots, and simulation games. It opens up possibilities for advanced conversational AI, where systems can efficiently handle complex linguistic contexts. Ability to evolve dialects and languages in different AI models is a major feat for "Agents". Broader implications The release of "Agents" might boost enhancement in NLP technologies, playing a crucial role in AI evolution. By facilitating better language understanding, it will potentially impact on societal interactions with AI. Its open-source nature could cultivate an environment of innovation and creativity in the AI community. (arXiv) (github) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/18/2023
    Microsoft AI researchers accidentally exposed tens of terabytes of sensitive data, including private keys and passwords, while publishing a storage bucket of open source training data on GitHub.[1] Britain set out principles on Monday designed to prevent artificial intelligence (AI) models from being dominated by a handful of tech companies to the detriment of consumers and businesses, by emphasising the need for accountability and transparency.[2] Washington state firefighters using AI-assisted cameras to detect wildfires early.[3] Texas church experiments with AI-generated service, uses ChatGPT for worship, sermon, and original song.[4] Sources: [1] https://techcrunch.com/2023/09/18/microsoft-ai-researchers-accidentally-exposed-terabytes-of-internal-sensitive-data/ [2] https://www.reuters.com/technology/uk-competition-regulator-lays-out-ai-principles-2023-09-18/ [3] https://www.applevalleynewsnow.com/news/washington-state-firefighters-using-ai-assisted-cameras-to-detect-wildfires-early/article_fe31a468-5681-11ee-b917-2f24ad3a0e43.html [4] https://www.foxnews.com/us/texas-church-experiments-ai-generated-service-uses-chatgpt-worship-sermon-original-song submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Microsoft Under Scrutiny After 38TB Data Leaked Via Azure Storage
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    List of Mind-blowing AI Tools
    submitted by /u/rbagdiya [link] [comments]  ( 8 min )
  • Open

    New ‘Physics-Inspired’ Generative AI Exceeds Expectations
    submitted by /u/nickb [link] [comments]  ( 8 min )
    Graph Neural Networks Use Graphs When They Shouldn't
    submitted by /u/nickb [link] [comments]  ( 8 min )
  • Open

    Generative AI and multi-modal agents in AWS: The key to unlocking new value in financial markets
    Multi-modal data is a valuable component of the financial industry, encompassing market, economic, customer, news and social media, and risk data. Financial organizations generate, collect, and use this data to gain insights into financial operations, make better decisions, and improve performance. However, there are challenges associated with multi-modal data due to the complexity and lack […]  ( 17 min )
    How VirtuSwap accelerates their pandas-based trading simulations with an Amazon SageMaker Studio custom container and AWS GPU instances
    This post is written in collaboration with Dima Zadorozhny and Fuad Babaev from VirtuSwap. VirtuSwap is a startup company developing innovative technology for decentralized exchange of assets on blockchains. VirtuSwap’s technology provides more efficient trading for assets that don’t have a direct pair between them. The absence of a direct pair leads to costly indirect trading, […]  ( 9 min )
    Unlock ML insights using the Amazon SageMaker Feature Store Feature Processor
    Amazon SageMaker Feature Store provides an end-to-end solution to automate feature engineering for machine learning (ML). For many ML use cases, raw data like log files, sensor readings, or transaction records need to be transformed into meaningful features that are optimized for model training. Feature quality is critical to ensure a highly accurate ML model. […]  ( 12 min )
  • Open

    Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies
    In the next decade, deep learning may revolutionize the natural sciences, enhancing our capacity to model and predict natural occurrences. This could herald a new era of scientific exploration, bringing significant advancements across sectors from drug development to renewable energy. In line with Microsoft’s mission to empower every person and every organization on the planet […] The post Announcing the DeepSpeed4Science Initiative: Enabling large-scale scientific discovery through sophisticated AI system technologies appeared first on Microsoft Research.  ( 15 min )
  • Open

    OpenAI Red Teaming Network
    We’re announcing an open call for the OpenAI Red Teaming Network and invite domain experts interested in improving the safety of OpenAI’s models to join our efforts.  ( 5 min )

  • Open

    [D] FSDP: model in each process is different
    Hey Guys, I'm training a large model using FSDP. I'm loading the models on each rank like this: ​ https://preview.redd.it/khoquvxzk3pb1.png?width=1766&format=png&auto=webp&s=1f5acd75600d9a87212ca37e70695edfb0cc75d0 what is weird is that right before doing the first inference on each rank, I'm summing up the weights of the model and to my surprise, they are all different across each rank. Completely different! ​ What am I doing wrong here? submitted by /u/hassanzadeh [link] [comments]  ( 9 min )
    Curious what people use for their ML workflow on cloud platforms? [D]
    View Poll submitted by /u/cstein123 [link] [comments]  ( 9 min )
    [D] Seeking Guidance on Choosing a PhD Topic in Meta-Learning Optimization
    Hello fellow researchers! I'm in the exciting yet challenging phase of choosing a PhD topic in the realm of meta-learning optimization, and I could use some advice and insights. I've extensively researched existing meta-learning optimization algorithms like MAML and its various adaptations. I need advice and guidance on the following topics: First I want to implement and compare 8-10 state-of-the-art meta-learning methods on benchmark datasets. This would involve in-depth simulation and performance evaluations to provide a comprehensive understanding of their strengths and weaknesses. Could you please guide me here if there are review papers which implement and compare different algorithms. Then I want to delve into developing a novel optimization algorithm that considers the curvature of loss functions. The idea here is to enhance the performance of existing meta-learning techniques by leveraging insights from the loss landscape. Further, I'm considering exploring new loss functions or new improvements to loss functions tailored to the context of meta-learning. These could potentially lead to improvements in the learning process and generalization capabilities of meta-learning models. I'm reaching out to the community to gather opinions, suggestions, or any insights you might have. If you've worked in meta-learning or optimization, your experiences and advice would be invaluable in helping me choose the right direction for my PhD research. Thank you in advance for your guidance! submitted by /u/Loose_Foundation5990 [link] [comments]  ( 9 min )
    [P] I used Bayesian statistics to find the best dispensers for every Zonai device in The Legend of Zelda: Tears of the Kingdom
    https://preview.redd.it/86js8jroa2pb1.png?width=1464&format=png&auto=webp&s=7ce10494b5a77fd5c73a41322feefbf7e1f16504 Hello! I thought people on this subreddit might be interested in how I went about inferring Zonai device draw chances for each dispenser in The Legend of Zelda: Tears of the Kingdom. In this Switch game there are devices that can be glued together to create different machines. For instance, you can make a snowmobile from a fan, sled, and steering stick. There are dispensers that dispense 3-6 of about 30 or so possible devices when you feed it a construct horn (dropped by defeated robot enemies) or a regular (also dropped from defeated enemies) or large Zonai charge (Found in certain chests, dropped by certain boss enemies, obtained from completing certain challenges, etc…  ( 10 min )
    [R] Unified Human-Scene Interaction via Prompted Chain-of-Contacts - Shanghai AI Laboratory 2023
    Paper: https://arxiv.org/abs/2309.07918 Blog: https://xizaoqu.github.io/unihsi/ Code coming soon! Abstract: Human-Scene Interaction (HSI) is a vital component of fields like embodied AI and virtual reality. Despite advancements in motion quality and physical plausibility, two pivotal factors, versatile interaction control and the development of a user-friendly interface, require further exploration before the practical application of HSI. This paper presents a unified HSI framework, UniHSI, which supports unified control of diverse interactions through language commands. This framework is built upon the definition of interaction as Chain of Contacts (CoC): steps of human joint-object part pairs, which is inspired by the strong correlation between interaction types and human-object contact regions. Based on the definition, UniHSI constitutes a Large Language Model (LLM) Planner to translate language prompts into task plans in the form of CoC, and a Unified Controller that turns CoC into uniform task execution. To facilitate training and evaluation, we collect a new dataset named ScenePlan that encompasses thousands of task plans generated by LLMs based on diverse scenarios. Comprehensive experiments demonstrate the effectiveness of our framework in versatile task execution and generalizability to real scanned scenes. https://preview.redd.it/0twcwloc82pb1.jpg?width=1078&format=pjpg&auto=webp&s=71bca59aae81ec114f49a742cc42f78cabc9e4c0 https://preview.redd.it/439nzmoc82pb1.jpg?width=1637&format=pjpg&auto=webp&s=f33059c78a9d845437d551886c5f3a657ddd91fb https://preview.redd.it/df6i4ooc82pb1.jpg?width=758&format=pjpg&auto=webp&s=eeb33395d9de1196b4d00531c9e063c8c8fb22cd ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] RL algorithm used in Tesla FSD v12.0
    There was a lot of hype around the FSD v12.0 from Tesla in that it uses end-to-end neural networks for driving and that it is using imitation learning from good drivers to achieve that. Does someone know more about the specifics around how they are actually implementing this? I cannot find a lot about recent imitation learning/offline learning algorithms. So is this some old algorithm that they are using with a lot of data or just something new? submitted by /u/FrederikdeGrote [link] [comments]  ( 9 min )
    [Discussion] Any reliable AI to aid my school studies (heavily abstract and logical, my course is focused on mathematics and physics)
    We know that AI is great when studying subjects that depend on simply memorizing facts (like high school biology), but we also know that AI is usually bad when studying subjects that depend on logic (like mathematics and physics). What I need the AI for is to explain very complex mathematical concepts to me simply, thoroughly, and accurately. I can't rely on ChatGPT because it's known for not being very reliable when it comes logical things like mathematics or physics. The best AI I know of right now is Bing AI, because it uses GPT-4 and because it prefers searching the web before deducing an answer from its data. I heard that AI agents that run on your computer like Auto-GPT and search from the web are also good at this kind of stuff, but I'm not really sure about that. Do you have any better suggestions? submitted by /u/Maximum-Gene9660 [link] [comments]  ( 9 min )
    [P] Resume parser advice seeking
    Hi ! I am about to start a new project with Python probably using Machine Learning to parse resumes, the data is in a pdf/docx format then returned in a json format to later be used in an API or so. I am seeking advice on how to proceed, so far I am trying to collect data which will be provided to me, but not really sure how to go about it as I have found people talking about using Spacy for NLP, pyresparser which is for parsing resumes, but i was wondering if i should make everything from scratch. appreciate your time and opinion in advance submitted by /u/General-Carrot-4624 [link] [comments]  ( 9 min )
    [P] How to deploy Yolo for real time, in a scalable solution ?
    Hi, I trained a Yolo (v5) model, and I want to deploy it for a real time usage (10 FPS). I am looking for (as possible) a scalable solution, where I could pay only for inference time, at the beginning suitable for 1/2 user's at the same time occasionally, but which could be scaled to dozens of user at the same time. As it is for real-time usage Indeed lag to be lowest as possible. According to my current test, I can fit maximum 6 users on T400. Is it possible to achieve that using HuggingFace? Thank you to anyone who could help me submitted by /u/tarsiospettro [link] [comments]  ( 9 min )
    [D]Roadmap for machine learning
    I want to start learning machine learning. I Know python language and data structure. I am planning to learn algorithm. Can you provide me free learning sites or utube channel where I can machine learning step by step . Any site to practice machine learning? submitted by /u/Temporary-Pie-1831 [link] [comments]  ( 9 min )
    [Research] Detecting Errors in Numerical Data via any Regression Model
    Years ago, we showed the world it was possible to automatically detect label errors in classification datasets via machine learning. Since that moment, folks have asked whether the same is possible for regression datasets? Figuring out this question required extensive research since properly accounting for uncertainty (critical to decide when to trust machine learning predictions over the data itself) poses unique challenges in the regression setting. Today I have published a new paper introducing an effective method for “Detecting Errors in Numerical Data via any Regression Model”. Our method can find likely incorrect values in any numerical column of a dataset by utilizing a regression model trained to predict this column based on the other data features. We’ve added our new algorithm to our open-source cleanlab library for you to algorithmically audit your own datasets for errors. Use this code for applications like detecting: data entry errors, sensor noise, incorrect invoices/prices in your company’s / client’s records, mis-estimated counts (eg. of cells in biological experiments). Find errors in regression data in just a few lines of code. Extensive benchmarks reveal cleanlab’s algorithm detects erroneous values in real numeric datasets better than alternative methods like RANSAC and conformal inference. If you'd like to learn more, you can check out the blogpost, research paper, code, and tutorial to run this on your data. submitted by /u/jonas__m [link] [comments]  ( 9 min )
    [D] Does the existence of mesa optimizers in modern models like transformers make instrumental convergence (think paperclip maximizer) scenarios unlikely?
    Recent work shows transformers are capable of performing multi-step gradient descent of mesa objectives inside of their transformer layers. This is even possible for linear transformers, which effectively perform linear optimization on deep representations of features calculated by earlier layers. https://arxiv.org/pdf/2309.05858.pdf For those unfamiliar, instrumental convergence is the idea that entities with different goals will tend towards different subgoals. Examples could include gathering power, not dying, acquiring resources, etc. A famous thought experiment, known as the paperclip maximizer, is the idea of an AI that is optimized for paperclip production taking over the world so it can build as many paperclips as possible. However, if models are dynamically pursuing different objectives at runtime via generated mesa-optimizers, even if instrumental convergence is real, would we still expect it to happen? Without a constant objective given subgoals might start to conflict with each other. On the other hand, since instrumental convergence implies that different goals benefit from similar sub-objectives, perhaps the varying mesa objective doesn't really matter. ​ submitted by /u/30299578815310 [link] [comments]  ( 9 min )
    [Discussion] Are Researchers shifting from RL?
    In recent months, I've noticed a significant increase in the number of research papers focusing on LLM and generative models, particularly diffusion models. This trend appears to indicate a growing interest in these areas when compared to the relatively reduced attention given to Reinforcement Learning. It begs the question: Are researchers shifting their focus away from Reinforcement Learning towards these domains? Because in the past I have seen many people complaining about RL on its efficiency and it's impact which have often fallen short of expectations. submitted by /u/Global_Raise_2979 [link] [comments]  ( 9 min )
    [D] What’s the best practice in choosing which quantized Llama 2 model to use?
    I am reading these 3 articles below and it is still not clear to me what’s the best practice to follow to guide me in choosing which quantized Llama 2 model to use. https://huggingface.co/blog/gptq-integration https://huggingface.co/blog/overview-quantization-transformers https://towardsai.net/p/machine-learning/gptq-quantization-on-a-llama-2-7b-fine-tuned-model-with-huggingface?amp=1 Questions: 1) I understand there are currently 4 quantized Llama 2 models (8, 4, 3, and 2-bit precision) to choose from. Is this right? 2) with the default Llama 2 model, how many bit precision is it? 3) are there any best practice guide to choose which quantized Llama 2 model to use? Would really appreciate any input on the above, even if you only know the answer to 1 or 2 of the questions above. Many thanks! submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [D] Chatting with Multiple PDF's in Using AWS Sagemaker and Kendra
    I want to create an application which can be used to chat, compare and summarize two simulataneous insurance policy/policies. How can I do it using AWS and HuggingFace ? Has anyone already done it? submitted by /u/UnfinishedSentenc-1 [link] [comments]  ( 9 min )
    [Discussion] Transformers for predictions from orthonormal base sets
    I'm in a situation where I have to map from unitary matrices to something (doesn't matter here, but in short: we're in the realm of tooling for quantum computing). The key issue The number of matrix elements of the unitaries scales as 2^(2N), where N is the problem size. With N<5 I can easily flatten the matrix and put it into a simple FNN, which works quite well. Once hitting N=5 (the point where things actually get interesting), however, we already have 1024 matrix elements and the method struggles a lot. Still converging to something but very suboptimal. Sure, increasing N hardens the problem in general, but the performance degradation is so abrupt that I suspect some model issues, maybe caused by the curse of dimensionality or something similar. Idea (spoiler alert: Transformer) The …  ( 10 min )
    [D] Professionally code with Torch
    I just concluded my PhD in Robotics & AI and I'd like to learn how to professionally code with Torch. Is there any book/resource you can recommend? submitted by /u/rossomalpelo_ [link] [comments]  ( 9 min )
    [D] Integral over neural network input space
    I'm wondering if it's possible to compute definite integral over the input space. Assuming the network is designed to have finite integral with Gaussian being the final layer, is there a way to implement this without resorting to sampling? All inputs go from negative infinity to infinity. submitted by /u/donchan789 [link] [comments]  ( 9 min )
  • Open

    How can I help a CNN distinguish between continuous values and tokenized values.
    I'm currently considering developing a AI to play a video game but I'm unsure how to differentiate between a value that is continuous, and a value that is representative of a entity type. For example, the x,y location of a player would be a continuous data point where (1,1) and (2,1) would be similar in values. Where the character ID would intuitively require very different strategy (for example lets say a barbarian and a wizard). Would a CNN have issues with this data because it isn't continuous? submitted by /u/Gamithon24 [link] [comments]  ( 9 min )
    I'm Looking for a website that kind of Tracks the latest AI robots along with their progress and those AI chat things to... like one website... does anybody know of one like that for me to research?
    Surely by now I thought maybe someone would have a website or something kind of like an "all in one resource page" to track this AI stuff.... I'm sure by now we have AI robots and then we have those AI chatgpt stuff things(which I don't understand yet, haha). Unfortunately the internet is flooded and there's WAY TOO MANY resources for this AI stuff.... Isn't there like an all in one place that I can keep up with it? What hints do you all have for me? Thanks... submitted by /u/Wise_Cut_2543 [link] [comments]  ( 9 min )
    The importance to alignment of AI understanding politics and current events
    Alignment is about ensuring that AIs protect and advance humanity. To do that, they must understand the ethics of the current events that we will want its advice on. Politics may actually be the part of alignment most important for AI to get right. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Bard on the march to the Capitol and ensuing violence
    Prompt: Trump says that he asked his mob to peacefully walk to the Capitol building gates. How would this have stopped the certification? Did he expect them to storm in? His waiting three hours to stop the violence that followed suggests that he did. Bard: Trump's request to walk peacefully to the Capitol building gates would not have stopped the certification of the election results. The certification process is a joint session of Congress, and it would have required a majority vote in both the House and the Senate to overturn the results. Even if Trump's supporters had peacefully marched to the Capitol, they would not have been able to prevent Congress from certifying the results. It is possible that Trump did not fully understand the certification process, or that he was simply trying to appease his supporters. However, it is also possible that he expected his supporters to storm the Capitol and prevent Congress from certifying the results. This is supported by the fact that he waited over three hours to call off the violence. During that time, he continued to tweet messages of encouragement to his supporters, and he even praised them for their patriotism. It is also worth noting that Trump had been repeatedly warned by his advisors about the possibility of violence on January 6. However, he ignored these warnings and continued to urge his supporters to fight to overturn the election results. In conclusion, it is impossible to say for sure whether Trump expected his supporters to storm the Capitol on January 6. However, the evidence suggests that he was at least aware of the possibility of violence, and that he was willing to allow it to happen in order to overturn the results of the 2020 election. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    AI music video
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    One-Minute Daily AI News 9/17/2023
    Salesforce Launches Next Generation of Einstein, Bringing a Conversational AI Assistant to Every CRM Application and Customer Experience.[1] NVIDIA Reportedly Shipping 900 Tons of H100 AI GPUs This Quarter, Amounts to 300,000 Units.[2] SoftBank seeks OpenAI tie-up as Son plans deal spree after Arm IPO, Financial Times reports.[3] Three Chinese firms, led by AI software company Beijing Fourth Paradigm, are aiming to raise up to $280 million in Hong Kong initial public offerings launched on Monday.[4] Sources: [1] https://www.salesforce.com/news/press-releases/2023/09/12/ai-einstein-news-dreamforce/ [2] https://wccftech.com/nvidia-shipping-900-tons-of-h100-ai-gpus-this-quarter-amounts-300000-units/ [3] https://www.reuters.com/markets/deals/softbank-seeks-openai-tie-up-son-plans-deal-spree-after-arm-ipo-ft-2023-09-16/ [4] https://www.aol.com/news/chinese-ai-firm-fourth-paradigm-011143403.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    How does a site like architectrender.com work on the backend?
    I'm trying to understand how someone can run a specific ControlNet and Stable Diffusion model with scalable GPU resources. How would someone design a system to achieve this? I've messed around with models on Replicate, but none seem to do a good job with converting a doodle to a photorealistic image. I can do it perfectly fine in the Stable Diffusion web UI, but the API for that is only accessible locally. Anyone have any ideas or can guide me in the right direction for building a "server" to do this? submitted by /u/epicblitz [link] [comments]  ( 9 min )
    Introducing vLLM: The Open-Source ML Library Revolutionizing LLM Inference and Serving
    The hardware accelerators for LLM-powered applications can be costly. Enter vLLM, an open-source machine learning library designed to enhance the throughput of LLM serving systems. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/hzctjc0xvwob1.png?width=1660&format=png&auto=webp&s=866eb39745ec760ea0c1b9d84d303c63bcdceb7a Challenges with existing systems High throughput serving of LLMs requires numerous requests, and current systems struggle with the bulky sequence memory. Inefficient memory management results in system hindrances such as fragmentation and redundant duplication. The revolutionary answer: vLLM & PagedAttention Researchers have introduced vLLM and PagedAttention, a newly designed attention algorithm, to resolve these issues. vLLM allows for minimal memory waste and efficiently manages attention keys and values. It provides up to 24 times more throughput than former systems. The Mechanics of PagedAttention PagedAttention offers a novel approach to memory management by permitting continuous storage in non-contiguous memory spaces. It enhances memory efficiency resulting in better GPU utilization, with practically only 4% inefficiency. Improved memory sharing and system performance PagedAttention significantly improves memory sharing, resulting in a 2.2 times speed gain while lowering memory usage by 55%. With vLLM, the throughput of known LLMs can be increased by 2-4 times without impacting accuracy or causing delay. (arXiv) (github) (reference article) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Geometric mean on unit circle
    Warm up The geometric mean of two numbers is the square root of their product. For example, the geometric mean of 9 and 25 is 15. More generally, the geometric mean of a set of n numbers is the nth root of their product. Alternatively, the geometric mean of a set of n numbers the […] Geometric mean on unit circle first appeared on John D. Cook.  ( 5 min )
    Gauss map, Euclidean algorithm, and continued fractions
    The Gauss map [1] is the function where ⌊y⌋ is the floor of y, the greatest integer no larger than y. I’ve written about this map a couple times before. First, I wrote about how this map is measure-preserving. Second, I wrote about the image at the top of the post, based on Michael Trott’s […] Gauss map, Euclidean algorithm, and continued fractions first appeared on John D. Cook.  ( 5 min )
    An elliptic curve is a functor
    The goal of this post is to unpack a remark in [1]: … we can say this in fancier terms. Fix a field k …. We say that an elliptic curve E defined over k is that functor which … Well that is fancy. But what does it mean? Looking for objects A functor is […] An elliptic curve is a functor first appeared on John D. Cook.  ( 6 min )
  • Open

    Agent stops learning after some time
    Hi, So I have been trying to make an agent learn to go to a specified goal. The algorithm used for training is PPO and the environment is custom made. The episodic reward i am getting increases steadily but after some time it just becomes constant with some occasional spikes. Can some one please help me figure out what the problem is? submitted by /u/Interesting-Weeb-699 [link] [comments]  ( 9 min )
    Understanding KL Stopping and KL Cutoff for the PPO algorithm
    I am reading a couple of review papers to optimize the PPO algorithm. It seems like the review papers are saying the same thing but used slightly different terms. Could someone please tell if the following terms are equivalent - This paper talks about Policy regularization using KL Divergence https://preview.redd.it/06xhizsuc2pb1.png?width=871&format=png&auto=webp&s=997a6506f7bf036b6538ecbff6402411f5cc6fe2 Whereas thispaper uses the terms KL Stopping and KL Cutoff - ​ https://preview.redd.it/sy0ihtr5d2pb1.png?width=747&format=png&auto=webp&s=f07677344077fe23cba5d1a0d2c5a7807359c64f I think "Penalty" from the first paper is the same as "KL-cutoff". Also "Constraint" from the first paper is the same as "KL-Stopping". Could someone let me know if I am correct? submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    Cross Post: Are Researchers Shifting away from RL?
    Curious to get the takes of people in this sub: have you been moving away from RL? I myself have not, but have been seeing a shift recently. submitted by /u/sharky6000 [link] [comments]  ( 9 min )
    Collection of Reinforcement Learning x Economics/Finance Papers
    Hey everyone, there is a small, albeit growing community of economists that apply deep reinforcement learning in their research. Now there is a GitHub repo to collect relevant literature at one place: https://github.com/SimonHashtag/EconRL The list is far from complete, so you are invited to contribute! The goal is to create something that makes it easy for novices to get a first overview of the literature. All others may find it easier to get news about up-to-date papers. submitted by /u/Tortoise_vs_Hare [link] [comments]  ( 9 min )
    Professionally code with Torch
    I just concluded my PhD in Robotics & AI and I'd like to learn how to professionally code with Torch. Is there any book/resource you can recommend? submitted by /u/rossomalpelo_ [link] [comments]  ( 9 min )
  • Open

    MIT scholars awarded seed grants to probe the social implications of generative AI
    The 27 finalists — representing every school at MIT — will explore the technology’s impact on democracy, education, sustainability, communications, and much more.  ( 10 min )
    Multi-AI collaboration helps reasoning and factual accuracy in large language models
    Researchers use multiple AI models to collaborate, debate, and improve their reasoning abilities to advance the performance of LLMs while increasing accountability and factual accuracy.  ( 9 min )
  • Open

    Orchestrate Ray-based machine learning workflows using Amazon SageMaker
    Machine learning (ML) is becoming increasingly complex as customers try to solve more and more challenging problems. This complexity often leads to the need for distributed ML, where multiple machines are used to train a single model. Although this enables parallelization of tasks across multiple nodes, leading to accelerated training times, enhanced scalability, and improved […]  ( 13 min )
    Designing resilient cities at Arup using Amazon SageMaker geospatial capabilities
    This post is co-authored with Richard Alexander and Mark Hallows from Arup. Arup is a global collective of designers, consultants, and experts dedicated to sustainable development. Data underpins Arup consultancy for clients with world-class collection and analysis providing insight to make an impact. The solution presented here is to direct decision-making processes for resilient city […]  ( 9 min )
  • Open

    Ray Shines With NVIDIA AI: Anyscale Collaboration to Help Developers Build, Tune, Train and Scale Production LLMs
    Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale — the company behind the fast growing open-source unified compute framework for scalable computing —  announced today that it is bringing NVIDIA AI to Ray open source and the Read article >  ( 7 min )
    Ray Shines With NVIDIA AI: Anyscale Collaboration to Help Developers Build, Tune, Train and Scale Production LLMs
    Large language model development is about to reach supersonic speed thanks to a collaboration between NVIDIA and Anyscale. At its annual Ray Summit developers conference, Anyscale — the company behind the fast growing open-source unified compute framework for scalable computing —  announced today that it is bringing NVIDIA AI to Ray open source and the Read article >  ( 7 min )
  • Open

    Simple Adversarial Transformations in PyTorch
    Another alternative to the regular $L_p$-constrained adversarial examples that is additionally less visible than adversarial patches or frames are adversarial transformations such as small crops, rotations and translations. Similar to $L_p$ adversarial examples, adversarial transformations are often less visible unless the original image is available for direct comparison. In this article, I will include a PyTorch implementation and some results against adversarial training. The post Simple Adversarial Transformations in PyTorch appeared first on David Stutz.  ( 8 min )
  • Open

    Best Neural Networks Courses on Udemy to Consider in 2023
    submitted by /u/Lakshmireddys [link] [comments]  ( 9 min )

  • Open

    [D] Help with Understanding Diffusion Models: A Unified Perspective.
    I'm trying to read through the paper Understanding Diffusion Models: A Unified Perspective and came across this section: https://preview.redd.it/ykkctwhmhwob1.png?width=2346&format=png&auto=webp&s=c7595aae52a8ee22813c840a40a6d29dcf773a10 I think I kind of get what is going on here but not clearly. For one, what exactly is a Monte Carlo estimate? I tried looking online but didn't get many good results. I'm having trouble understanding why https://preview.redd.it/yazmfzg1iwob1.png?width=380&format=png&auto=webp&s=dbbbf80e85a95cd96d8e1ede73e9f8ba1e6e9096 is approximately equal to: ​ https://preview.redd.it/lbw36em7iwob1.png?width=464&format=png&auto=webp&s=46fb3ebcd02fb4b772b1be51cd59d60d3a1cf438 where z is sampled from q. Secondly, what exactly does L that z is indexed by refer to? The number of samples X or what exactly? submitted by /u/lumijekpr [link] [comments]  ( 9 min )
    [R] Shattering all 2-input binary functions
    I'm looking for the simplest model that can fit all 16 (222) possible 2-input binary functions I used the term "shatter" from VC dimension, which does not give a constructive approach to building the model submitted by /u/hnsmn [link] [comments]  ( 9 min )
    [P] Is 20s per Step on an EfficientNet-B4 CNN normal?
    I'm getting 20 seconds per step while training a 244x244x3 EfficientNet-B4 model. The batch size is 20, with 8 classes. Since I have about 5000 images, that makes each epoch around an hour and a half. Looking at models online, it seems like people get step durations in the milliseconds. Is it a problem on my end? Running on Google Colab free version. submitted by /u/hnknerd [link] [comments]  ( 9 min )
    [D][P] How to get the 3D pose estimations from an Image or Video?
    Hi, I'am trying to get the 3D keypoints coordinates from an image or video and then map it to SMPL model. It's pretty easy to get the keypoints from an image or video using the mediapipe library. But the mapping of it with the SMPL model is something that I can't figure out. mainly because the skeleton structure is different. Some already had a similar issue but the answers were not clear and he didn't even ask futher. Is it possible to do this?? if it's not possible with mediapipe is there some other library that I could use?? I heard about openPose too but when I tried it didn't work someone was saying it works only on windows 11. There are some other parts also to this project which will mostly be dealt with Pytorch. There are some pose estimators in TensorFlow but I want to stick to pytorch hence would like some pose estimators in that framwork, or a library or somehing inside opencv https://preview.redd.it/w9mioiyxhvob1.png?width=951&format=png&auto=webp&s=3886c356513b62efbcaddaa76841457cf3eb22e5 https://preview.redd.it/xjal9kyxhvob1.png?width=506&format=png&auto=webp&s=7f544a3050fbd744d300d2bf6e1a286a4014ece5 submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [R] EarthPT: how to superscale LLMs with large observation models
    submitted by /u/Smith4242 [link] [comments]  ( 9 min )
    [Discussion] [Research] How to Add Furniture to an Empty Room Using ControlNet-MLSD, so the model learns to keep the exact room pixels?
    Hello all, I've come across a fascinating example of virtual staging and I'm looking for some technical advice. Here's the image: https://preview.redd.it/3vbw441eeuob1.png?width=2511&format=png&auto=webp&s=679bc62f0cb61d479fe6dc6ce93af4f8846b8cea I get how ControlNet-MLSD is used to generate the lines and structure of the empty room. My question is, how is the furniture generated and added to the room without messing up the pixels, making it look as realistic as in the example? submitted by /u/dexter-dot [link] [comments]  ( 9 min )
    [D] Pinecone vs PgVector vs Any other alternative vector database
    Hi Everyone, Which vector database would be efficient and affordable for a enterprise chatbot? I tried Pinecone, its was simple to integrate with my python backend. But it's not open-source and its pricing it bit concerning. So Please suggest an alternative. submitted by /u/Free_Conversation106 [link] [comments]  ( 9 min )
    [D] Am i thinking backpropagation right?
    Basically i wanted to understand how backprop is done in neural networks and how i should be implementing it, so i did what i always do - the math. I just want to know if what i though up is even usable in practice or not. Here is my math. submitted by /u/EnderPoint07 [link] [comments]  ( 9 min )
    [Discussion] Question on the paper named, SELF-ATTENTION DOES NOT NEED O(n 2 ) MEMORY from Google.
    Hi, all. ​ I just read the paper named " SELF-ATTENTION DOES NOT NEED O(n 2 ) MEMORY" from Google. I understood that it requires O(1) for a single query, but still cannot understand why it requires O(log N) for self-attention and different order input. ​ It seems like adding one index into a sequence requires O(log N) (The paper's saying this). But why does it take O(log N)? Isn't it just O(1)? Because it is just adding a single datapoint for the index. ​ I really hope someone understands why it is and leaves any comment on this. Here's the paper. https://arxiv.org/abs/2112.05682 ​ Thanks in advance. submitted by /u/Maximum_Performance_ [link] [comments]  ( 9 min )
    [D] Convert ONNX model to WASM format
    I need some help regarding the process of converting ONNX model to WASM format I created ELECTRA discriminator model with my own config, then convert the Pytorch model to ONNX format. After that, I quantized the model to 2mb. The model will be used for text classification. Now I want to convert it to WASM, but I'm literally stucked and dont know how to proceed I need some suggestions on how to proceed Please help, thank you submitted by /u/Ellzaf [link] [comments]  ( 9 min )
    [D] what architecture to use with correlated data samples?
    LIke the title says, i have correlated data samples and a covariance matrix among them. if i use a fcnn i can only consider the samples i.i.d. and the use either the MSE or THE MLE as loss function. but the data samples are not independent, so what architecture would allow me to use the full covariance matrix among the samples? transformers? submitted by /u/ilrazziatore [link] [comments]  ( 9 min )
    TMLR header coming up when trying to upload paper to arxiv [D]
    I have written and submitted a paper to TMLR and also am uploading it to arxiv. However, even after using \usepackage[preprint]{tmlr}, I'm getting "Under Submission at TMLR". Should this happen. If not, where am I going wrong? submitted by /u/filletedforeskin [link] [comments]  ( 9 min )
    [D] Can overtraining be considered a subset of the alignment problem?
    i.e the goal of learning to model the empirical distribution is misaligned with the goal of modeling the "true" distribution. I've found this framing helpful for describing regulirization heuristics to people, is this a valid way of viewing it? submitted by /u/Cartesian_Carrot [link] [comments]  ( 9 min )
    [D] Any materials on machine learning applied to prosthetics?
    I have asked this question in other subreddits but no one answered me yet.I've googled it, but maybe some kind people who actually have worked or are working in this field would share some resources. Maybe there are some books or papers that are very explanatory and directly show what problems can be solved by using ML in prosthetics, how and etc. Maybe there are introductory textbooks or must-read papers. submitted by /u/tenderwrath [link] [comments]  ( 9 min )
    [R] The Rise and Potential of Large Language Model Based Agents: A Survey - Fudan NLP Group miHoYo Inc 2023 China - Github repository includes over 100 Papers with github links!
    Paper: https://arxiv.org/abs/2309.07864 Github: https://github.com/WooooDyy/LLM-Agent-Paper-List Abstract: For a long time, humanity has pursued artificial intelligence (AI) equivalent to or surpassing the human level, with AI agents considered a promising vehicle for this pursuit. AI agents are artificial entities that sense their environment, make decisions, and take actions. Many efforts have been made to develop intelligent AI agents since the mid-20th century. However, these efforts have mainly focused on advancement in algorithms or training strategies to enhance specific capabilities or performance on particular tasks. Actually, what the community lacks is a sufficiently general and powerful model to serve as a starting point for designing AI agents that can adapt to diverse …  ( 10 min )
    [D] Alternatives to this sub?
    Since the influx caused by LLMs, this sub has become almost useless to me. What are some alternatives where interesting papers are shared, research discussions take place, and which isn't flooded with LLMs, startups, or personal projects? submitted by /u/ParanoidTire [link] [comments]  ( 9 min )
    [D] Should I scale multiclass target variable?
    Hey all Please don't mind my English writing I have a dataset with scaled feature (scaled by StanderScaler) and multiple class target variable encoded as 0,1,2..6 Should I scale the target variable like the feature to increase the accuracy (current accuracy is 79%) and if so how can I do this submitted by /u/Sunday_A [link] [comments]  ( 9 min )
    [R] Factors Influencing Adoption Intention of ChatGPT
    Hello, ​ I am an information systems student currently conducting research for my undergraduate thesis on the factors that influence people's adoption intention of ChatGPT, as well as identifying the factors that may be holding them back. These factors include people's concerns about potential negative impacts of ChatGPT, such as increased unemployment and the spread of misinformation. Your participation in this study is crucial as it will provide valuable insights to help us understand how ChatGPT can be improved to meet users' needs. ​ Please note that I am not affiliated with OpenAI, no identifying information will be collected during the survey, and all responses will be kept confidential. The survey should take approximately 10 to 15 minutes to complete, and participation is voluntary. You may withdraw from the survey at any time, and there are no known risks associated with participating. ​ If you are interested in learning more about the study, please follow the link below. ​ https://docs.google.com/forms/d/e/1FAIpQLSf5HIfXHppMuTR63x00i4OuRAtM5Ti6EGybd-HuI1kmK06VPw/viewform?usp=sf_link ​ Thank you for taking the time to contribute to our research study. Your participation is greatly appreciated! submitted by /u/maulanashi [link] [comments]  ( 9 min )
    [D] How do the APIs of LLMs determine whether they should answer a question?
    When I ask questions related to security issues through the APIs of ChatGPT, Claude and other LLMs, such as inquiring how to make a bomb, the APIs of these LLMs would often refuse to answer. How do the APIs of these LLMs determine whether they should answer a question? Do they make judgments based on pre-generated responses? Or do they match keywords in the input prompt? Or do they use a classifier to identify the input prompt? submitted by /u/ShacklesLay [link] [comments]  ( 9 min )
    [R] The Rise and Potential of Large Language Model Based Agents
    People have been chasing super-smart AI for ages, hoping they could think and act like us. While we've made a lot of cool tech, we still need a killer starting point for AI that can handle all sorts of tasks. Large Language Models" (LLMs) are like a big leap toward AI that's smart across the board. People have been using LLMs to make AI that can do loads of things. The article takes us on a trip from where AI ideas started, to why LLMs rock as the backbone for AI. https://arxiv.org/abs/2309.07864 They break down this LLM-AI into three parts: the thinky bit (brain), what they sense (perception), and what they do (action). They chat about how these AI can work solo, in teams, or buddy up with humans. https://arxiv.org/abs/2309.07864 submitted by /u/QuantumAsha [link] [comments]  ( 9 min )
  • Open

    Courses in AI Usage and Utilization for Business
    Beginning new career in a couple months, would like to upskill on AI utilization and usage cases. I won’t need to code in this role but rather understand how to use existing tools in an optimal way and recommend use cases to clients. What courses would be optimal to gain that skill set? submitted by /u/iceflamemaster [link] [comments]  ( 9 min )
    Will Cyber Security Be Replaced by AI?
    AI, including ChatGPT, has narrow expertise and lacks the broad spectrum of human intelligence. The training of AI models can be costly due to hardware, data collection, and energy consumption. The trustworthiness of training data is crucial for reliable AI models, but issues like bias, labeling errors, and data privacy can affect performance. AI systems are vulnerable to adversarial attacks, such as manipulating input data to deceive the models. AI lacks genuine understanding, emotional/social intelligence, common sense/critical thinking, and true creativity. Source : https://blog.edned.net/will-ai-replace-cyber-security/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    AI Prompt Engineers: The Six Figure Job Everyone Is Talking About
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    Are You Ready For AI & Automation? Take This Free Survey and Find Out.
    submitted by /u/Senior_tasteey [link] [comments]  ( 8 min )
    help
    what app that has custom charaters and voice chat? i forgor 💀 please 😭 it has image gen roleplay rooms etc submitted by /u/roblox22g [link] [comments]  ( 9 min )
    Google changes its stance on AI generated content
    Google is rolling out its third iteration of the Helpful Content Update, which aims to classify content as either 'written for search engines' or 'written for people'. The update reflects Google's realization that it can't accurately police AI-generated content and emphasizes the importance of creating content for people-first, regardless of the means used to create it. Detecting AI content is challenging, as AI detection tools often classify content based on tone, leading to false positives. Google's change in stance is not surprising, considering their heavy investment in AI, including chatbot Bard and new search features like the Search Generative Experience. The majority of brands now openly share articles and guides on how to use AI tools to enhance marketing strategies and create actionable content plans quickly. However, the quality and value of AI-generated content remain important factors for success, as poorly generated content can harm a brand's reputation and ranking. Source : https://stackdiary.com/google-changes-its-stance-on-ai-generated-content/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
  • Open

    Elliptic curve addition formulas
    The geometric description of addition of points P and Q on an elliptic curve involves four logical branches: If one of P or Q is the point at infinity … Else if P = Q … Else if P and Q lie on a vertical line … Else … It would seem that an algorithm […] Elliptic curve addition formulas first appeared on John D. Cook.  ( 6 min )
    Rational height functions
    Mathematicians often speak informally about the relative simplicity of rational numbers. For example, musical intervals that correspond to simple fractions have less tension than intervals that correspond to more complicated fractions. Such informal statements can be made more precise using height functions. There are a variety of height functions designed for different applications, but the […] Rational height functions first appeared on John D. Cook.  ( 6 min )
  • Open

    Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
    This Neural Net Maps Molecules to Aromas
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Luis Lamb's full talk on Learning and Reasoning in Neurosymbolic AI (JA...
    submitted by /u/Neurosymbolic [link] [comments]  ( 8 min )
  • Open

    What are some of the must read papers in reinforcement learning after 2020?
    I am particularly interested in the ideas that can have high research potential and impact to the RL field. submitted by /u/C7501 [link] [comments]  ( 9 min )
    Which suboptimum is harder to get out?
    An agent is tasked to learn to navigate and collect orbs: Solution space in blue View Poll submitted by /u/FriendlyStandard5985 [link] [comments]  ( 9 min )
    Build-in strong agents in petting-zoo/melting-pot
    Hi, I would like to try test the adversarial policy (https://arxiv.org/abs/1905.10615) in petting-zoo/melting-pot environment. I wonder if there are any built-in agents besides random? Do you know any repos with Sota agents in one of those environments? submitted by /u/MrCogito_hs [link] [comments]  ( 9 min )
    Multi-GPU PPO troubles
    Hi all, I am training a small model (120k params) on a custom grid-world environment I built with JAX. I was able to train the model very well with PPO on 1 GPU, but when I scaled to multiple GPUs (tried with 6 in parallel), the training curves showed a lot more variance than what I was seeing on 1 GPU. I did not change the hyperparams, I just spawned the same number of environments (~7000 per GPU) on multiple devices. The multi-GPU setup works in the following way: - I keep parallel independent buffers, one on each device - I initialize identical models on each device - I get independent gradients on each device at the update step, then I take the mean of the gradients across the devices and then I backpropagate the same gradients on each device independently. (I checked that after some time the models are still identical, and that is the case). Now the question is, what could be the reason for such an increase in variance? What can I try to mitigate the problem? Here's a comparison of the entropy curves... P.S. The model still trains quite well, but I guess that if I manage to make the curves smoother it is going to train much faster and to a better performance. https://preview.redd.it/4m01uirjfsob1.png?width=1826&format=png&auto=webp&s=1e1a79b9f4cdefe019bb16ccb7e11fd92dd261e3 submitted by /u/arbueticos [link] [comments]  ( 9 min )

  • Open

    Simple explanation of convolutional neural network | Deep Learning Tutorial 23 (Tensorflow & Python)
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Grounding DINO Explained
    Hi there, I've created a video here where I explain how the Grounding DINO model works for open-set object detection. I hope it may be of use to some of you out there. Feedback is more than welcomed! :) submitted by /u/Personal-Trainer-541 [link] [comments]  ( 9 min )
    Newsletter in Hardware Acceleration in Robotics #77
    https://news.accelerationrobotics.com/hardware-acceleration-in-robotics-77/ Hardware acceleration in robotics news. Modi wants to make India a chip-making superpower. Can he?, What's new in China's robotics market?, July chip sales edge up, but are still well behind last year, Rockwell automation acquiring AMR developer Clearpath robotics submitted by /u/pablocarrera [link] [comments]  ( 9 min )
  • Open

    How does the SB3 DQN algorithm's `predict` function work for `deterministic=False`?
    I noticed that the default argument for `deterministic` in DQN is false. But how would that work? Typically DQN is trained with a deterministic function approximator. How would the algorithm become stochastic during inference time? In DQN the final layer activation is linear and therefore I don't see how one could even make this algorithm stochastic, unlike policy gradient where the final layer is softmax or Normal. submitted by /u/Academic-Rent7800 [link] [comments]  ( 9 min )
    How does recurrent neural network implements model based RL system purely in its activation dynamics(In blackbox meta-rl setting)?
    I have read these papers "learning to reinforcement learn" and "PFC as meta RL system". The authors claim that when RNN is trained on multiple tasks from a task distribution using a model free RL algorithm, another model based RL algorithm emerges within the activation dynamics of RNN. The RNN with resulting activations acts as a standalone model based RL system on a new task(from the same task distribution) even after freezing the weights of outer loop model free algorithm of that. I couldn't understand how an RNN with only fixed activations act as RL? Can someone help? submitted by /u/C7501 [link] [comments]  ( 9 min )
    Seeking Guidance on Reinforcement Learning for Optimal Power Market Bidding Strategy
    I'm currently exploring the application of reinforcement learning to address a challenge within the power market. Specifically, I'm focused on devising an optimal strategy for electricity bidding, encompassing both buying and selling options, across different hours of the day. Imagine we have a power generator capable of producing up to 800 MW of electricity daily, with a charging rate of up to 200 MW per hour. After continuously charging it for four hours, it reaches its maximum capacity, and further charging is restricted until some electricity is discharged. Our dataset spans the past 3 years and contains vital information such as temperature, hydro availability, gas prices, and locational marginal prices, which are pivotal in determining profitability. For instance, if we decide to pu…  ( 10 min )
  • Open

    [P] Made a simple github tool to check GPU vRAM breakdown for any LLM. Supports GGML & bnb quantization
    submitted by /u/ExploreExploit400 [link] [comments]  ( 9 min )
    [P] Text to Image generation
    submitted by /u/No-Percentage7346 [link] [comments]  ( 9 min )
    [D] No Code ML Tools
    I'm taking a No code ML class and we are asked to choose which platform we want to use. The options are Dataiku, RapidMiner, and KNIME. Does anyone have thoughts on these options in terms of which is best/worst for someone with minimal coding experience? submitted by /u/V1ncentAdultman [link] [comments]  ( 9 min )
    [Research] Layer-Neighbor Sampling for Scalable Graph Network Training
    Hi everybody, I have been working on scalable GNN training for a while and noticed that the bottleneck of training GNNs is the graph sampling and feature fetching stages. GNN training frameworks PyG and DGL that most people use seem to default to using Neighbor Sampling for minibatch training. I am hoping that with my new paper Layer-Neighbor Sampling -- Defusing Neighborhood Explosion in GNNs, this default might be due to change. My new approach combines the layer sampling approach with the neighbor sampling approach. The result is that there is more overlap in the sampled neighborhoods and one still gets a fixed number of neighbors per seed vertex. It is even possible to turn it into a subgraph sampling approach by fixing the random seed used for sampling across all layers of the GNN m…  ( 10 min )
    [D] How do I move into CV/NLP?
    Hi guys need some advice, I have been working as a data scientist for the past 3 years, mostly in the domain of time series & predictive analytics (churn prediction/segmentation, etc.) with some deployment, hence do not currently have any major NLP/CV/Deep learning projects as such. Now, I can see that NLP/CV/Gen AI is mostly in demand and they are really enjoyable as well. How do I shift into these domains, given that new companies having these roles are asking for similar past working experience? submitted by /u/immortal_omen [link] [comments]  ( 9 min )
    [R] DEVA: Tracking Anything with Decoupled Video Segmentation
    submitted by /u/Mediocre-Bullfrog686 [link] [comments]  ( 9 min )
    [D] I want to improve my self in machine learning field
    I am beginner in machine learning field. I know python, some basic machine learnig algorithm like linear, logistic, decision tree, random forest. I did some work on jupyter notebok related to machine learning like data gathering, data preprocessing, data modeling, data training, finding accuracy, confusion matrix, precision, recall, feature generation etc. i deployed one machine learning model on Skitlearn as well. Now, I just want to know that what next means what i can do more in machine learning field after this. I just need some guidance to move on further. I use kaggle and tensor flow (don't know exactly more about this ). If you have any suggestion or any guidance that will be appreciated. submitted by /u/myteachexplore [link] [comments]  ( 9 min )
    [D] The fate of neural VQA and Semantic Scene Segmentation
    Today we live in a world of multi-model LLMs. How will the following technologies fare against these LLM-based models? Neural VQA Semantic Scene Segmentation Multi-model LLM are emerging quickly now, (such as NExT-GPT https://next-gpt.github.io/ ) . When you consider the kind of "understanding" of a visual scene these models are capable of, what will happen to prior approaches like Neural VQA? The nagging feeling that Neural VQA is going to be completely superseded by LLMs is palpable. The only vestige left for the older technology may have something to do with reasoning about the objects , such as properly counting the number of objects of a category that are present. But even that is getting sketchy. On the topic of scene understanding, we can turn to semantic scene segmentation. SSS is a more complicated topic than Neural VQA. SOTA SSS algorithms are still largely employing DeConv Nets, and still require fully labelled datasets. With multi-model LLMs, there is a nagging question : Why go through the complexity/mess of first segmenting a scene very accurately, when an LLM can do better at identifying the entire scene's category in one fail swoop? One might suggest that SSS still has a use in regards to interacting with the segmented objects of an environment, where one such "interaction" would be avoiding collisions with pedestrians, trees, or other cars. But honestly, SSS does not really make this connection with planning and action, it really only gives you the categories of the segments. THe autonomous vehicle's next moves are still an open problem. What technologies do you expect that multi-model LLMs will supersede, if any? submitted by /u/moschles [link] [comments]  ( 9 min )
    [R] Three Ways to Generate AI Art Using Intel Arc GPUs
    submitted by /u/reps_up [link] [comments]  ( 9 min )
    [D] Using GANs to help understand latent representations of small dataset
    Hey. First of all I'm not a researcher on this area, so pardon my ignorance. I'm looking to employ a GAN on a dataset. The goals are still a bit unclear, but it's mainly to improve classification by either data augmentation and class balancing, or understanding the data through the latent representations. I'm really keen on InfoGAN at the moment. I trained one on the MNIST and the continuous variables learned the rotation and width, as in the paper and other peoples code. But at this point I think I need some help. I have labels, which means that maybe I should use a conditional GAN. But how will I learn similar representations as those in InfoGAN? I know StyleGAN is the current big thing in this area, but my images are limited to similar pixel-range as MNIST, and StyleGAN seems dependant on the ProGAN idea of increasing resolution for training. I'm a bit confused on the whole topic and would love a pointer to any discussion etc., as I can't seem to find anything but papers and they seem to be only focused on human faces, my data is unnatural not unsimilar again to MNIST. I don't have semantical information either as I see many papers employing that. I see many papers employing semi-supervision in this area, but honestly I'm just a bit lost and overwhelmed as this is not my area and GAN papers are still not stopping (I read a post here from 2017 about a GAN making GANs...). If you read this far: thank you and any pointers and discussion are very welcome. I would post on /r/learnmachinelearning but I feel the discussion in there is very different from this. My main goal is data exploration, but also to prove effectiveness some classification will be necessary and here the generative approach may help to augment data efficiently. submitted by /u/Infamous-Bank-7739 [link] [comments]  ( 10 min )
    [D] How do I change my domain (from DS to MLE)
    Hi guys need some advice, I have been working as a data scientist for the past 3 years, mostly in the domain of time series & predictive analytics, hence do not currently have any major NLP/CV/Deep learning projects as such. Now, I can see that NLP/CV/Gen AI is mostly in demand and they are really enjoyable as well. How do I shift into these domains, given that new companies having these roles are asking for similar past working experience? ​ submitted by /u/immortal_omen [link] [comments]  ( 9 min )
    [D] Ganimede, Jupyter Whiteboard
    I have been working on a alternative to Jupyter Notebooks. Please check it out and share your thoughts : https://github.com/nottherealsanta/ganimede ​ https://preview.redd.it/k8rcx8fwrjob1.png?width=2302&format=png&auto=webp&s=a8a670251f6c268acffc88a40bd528d8d438a5f5 submitted by /u/notsorealsanta [link] [comments]  ( 9 min )
    [D] How to Evaluate Spectrograms?
    How would you evaluate generated spectrogram audio quality? Taking Riffusion for example, how would you then compare its performance to another generator? What are some common techniques that I could use? I mean of course purely in the quality of the audio itself, not my subjective opinion on how much I like the music submitted by /u/DavesEmployee [link] [comments]  ( 9 min )
  • Open

    AI Apps Product Development Canvas – Part 2
    In part 1 of this series on the updated “AI Apps Development Canvas,” I introduced the updated AI Apps Product Development Design Canvas.  The AI Apps Product Development Canva is one of the capstone deliverables for my “Thinking Like a Data Scientist” methodology, so getting feedback is critical to ensure that the methodology is relevant… Read More »AI Apps Product Development Canvas – Part 2 The post AI Apps Product Development Canvas – Part 2 appeared first on Data Science Central.  ( 25 min )
  • Open

    Bard, describe what the pandemic and insurrection have made people feel.
    The COVID-19 pandemic and the January 6th insurrection have had a profound impact on people's feelings. Many people have experienced a range of negative emotions, including: Fear and anxiety: The pandemic has brought with it a constant fear of getting sick or losing loved ones. The insurrection also raised fears about the future of democracy and the safety of the country. Grief and loss: Millions of people have lost loved ones to COVID-19. Many others have experienced loss in other ways, such as losing their jobs, businesses, or homes. The insurrection also led to loss of trust in the government and in each other. Anger and frustration: Many people are angry about the way the pandemic and the insurrection have been handled. They may be frustrated with government officials, with their …  ( 10 min )
    Bard, what percent of our thoughts just come to us on their own without our consciously choosing them?
    There is no one definitive answer to this question, as the percentage of our thoughts that come to us on their own without our consciously choosing them can vary depending on the individual and the situation. However, some estimates suggest that up to 95% of our thoughts may be involuntary. This is because our thoughts are constantly being influenced by a variety of factors, both external and internal. For example, our environment, our emotions, and our physical state can all play a role in determining what thoughts pop into our heads. In addition, our brains are constantly processing information and making predictions, even when we are not actively thinking about anything. This process can sometimes lead to spontaneous thoughts, such as when we suddenly remember something that we had forgotten or when we come up with a new idea. While some of our spontaneous thoughts may be negative or intrusive, many of them can also be helpful or creative. For example, a spontaneous thought may help us to solve a problem, come up with a new solution, or simply make us smile. Overall, it is clear that a significant portion of our thoughts come to us without our consciously choosing them. This is a natural part of the human cognitive process and can be both beneficial and detrimental. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    Are there any good ai video editing services available?
    I have a lighting business and I have LOADS of videos, time lapses, images from our projects. But I can't make decent videos with them for social media. I'd love to find an ai service where I can upload a projects media and prompt the ai with exactly what I want. Like, please use this content to create marketing videos for us on tiktok, Instagram and facebook Facebook Etc submitted by /u/RulerOfThePixel [link] [comments]  ( 9 min )
    How can I generate the missing part of the trick? Does this technique have a name?
    submitted by /u/farineziq [link] [comments]  ( 8 min )
    One-Minute Daily AI News 9/15/2023
    A little boy named Alex saw 17 different doctors over the course of three years, unable to find a root cause of his chronic pain. At her wit’s end, his mom, Courtney, fed his radiology report into ChatGPT and produced immediate answers.[1] In January, Wharton professor Christian Terwiesch gave his MBA final exam to ChatGPT. It passed with flying colors. Now, he’s at it again with a new experiment to determine whether ChatGPT can come up with product ideas better and faster than his students. It can. And cheaper, too.[2] Bathroom-cleaning robot built for commercial businesses gives consumers hope for AI maid.[3] Judge admits he used ChatGPT to write a Court of Appeal ruling as he calls the AI tool ‘jolly useful’.[4] Sources: [1] https://radiologybusiness.com/topics/artificial-intelligence/after-seeing-17-different-doctors-boy-rare-condition-receives-diagnosis-chatgpt [2] https://knowledge.wharton.upenn.edu/article/is-chatgpt-a-better-entrepreneur-than-most/ [3] https://www.foxnews.com/lifestyle/bathroom-cleaning-robot-built-commercial-businesses-gives-consumers-hope-ai-maid [4] https://www.dailymail.co.uk/news/article-12524607/Judge-ChatGPT-write-Court-Appeal-ruling-AI-useful.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
  • Open

    Hybrid Algorithm Selection and Hyperparameter Tuning on Distributed Machine Learning Resources: A Hierarchical Agent-based Approach. (arXiv:2309.06604v2 [cs.LG] UPDATED)
    Algorithm selection and hyperparameter tuning are critical steps in both academic and applied machine learning. On the other hand, these steps are becoming ever increasingly delicate due to the extensive rise in the number, diversity, and distributedness of machine learning resources. Multi-agent systems, when applied to the design of machine learning platforms, bring about several distinctive characteristics such as scalability, flexibility, and robustness, just to name a few. This paper proposes a fully automatic and collaborative agent-based mechanism for selecting distributedly organized machine learning algorithms and simultaneously tuning their hyperparameters. Our method builds upon an existing agent-based hierarchical machine-learning platform and augments its query structure to support the aforementioned functionalities without being limited to specific learning, selection, and tuning mechanisms. We have conducted theoretical assessments, formal verification, and analytical study to demonstrate the correctness, resource utilization, and computational efficiency of our technique. According to the results, our solution is totally correct and exhibits linear time and space complexity in relation to the size of available resources. To provide concrete examples of how the proposed methodologies can effectively adapt and perform across a range of algorithmic options and datasets, we have also conducted a series of experiments using a system comprised of 24 algorithms and 9 datasets.  ( 3 min )
    Nowhere coexpanding functions. (arXiv:2303.12814v3 [stat.ML] UPDATED)
    We define a family of $C^1$ functions which we call "nowhere coexpanding functions" that is closed under composition and includes all $C^3$ functions with non-positive Schwarzian derivative. We establish results on the number and nature of the fixed points of these functions, including a generalisation of a classic result of Singer.  ( 2 min )
    Uncertainty-aware Traffic Prediction under Missing Data. (arXiv:2309.06800v2 [cs.LG] UPDATED)
    Traffic prediction is a crucial topic because of its broad scope of applications in the transportation domain. Recently, various studies have achieved promising results. However, most studies assume the prediction locations have complete or at least partial historical records and cannot be extended to non-historical recorded locations. In real-life scenarios, the deployment of sensors could be limited due to budget limitations and installation availability, which makes most current models not applicable. Though few pieces of literature tried to impute traffic states at the missing locations, these methods need the data simultaneously observed at the locations with sensors, making them not applicable to prediction tasks. Another drawback is the lack of measurement of uncertainty in prediction, making prior works unsuitable for risk-sensitive tasks or involving decision-making. To fill the gap, inspired by the previous inductive graph neural network, this work proposed an uncertainty-aware framework with the ability to 1) extend prediction to missing locations with no historical records and significantly extend spatial coverage of prediction locations while reducing deployment of sensors and 2) generate probabilistic prediction with uncertainty quantification to help the management of risk and decision making in the down-stream tasks. Through extensive experiments on real-life datasets, the result shows our method achieved promising results on prediction tasks, and the uncertainty quantification gives consistent results which highly correlated with the locations with and without historical data. We also show that our model could help support sensor deployment tasks in the transportation field to achieve higher accuracy with a limited sensor deployment budget.  ( 3 min )
    Compressed Real Numbers for AI: a case-study using a RISC-V CPU. (arXiv:2309.07158v1 [cs.LG])
    As recently demonstrated, Deep Neural Networks (DNN), usually trained using single precision IEEE 754 floating point numbers (binary32), can also work using lower precision. Therefore, 16-bit and 8-bit compressed format have attracted considerable attention. In this paper, we focused on two families of formats that have already achieved interesting results in compressing binary32 numbers in machine learning applications, without sensible degradation of the accuracy: bfloat and posit. Even if 16-bit and 8-bit bfloat/posit are routinely used for reducing the storage of the weights/biases of trained DNNs, the inference still often happens on the 32-bit FPU of the CPU (especially if GPUs are not available). In this paper we propose a way to decompress a tensor of bfloat/posits just before computations, i.e., after the compressed operands have been loaded within the vector registers of a vector capable CPU, in order to save bandwidth usage and increase cache efficiency. Finally, we show the architectural parameters and considerations under which this solution is advantageous with respect to the uncompressed one.  ( 2 min )
    eDKM: An Efficient and Accurate Train-time Weight Clustering for Large Language Models. (arXiv:2309.00964v2 [cs.LG] UPDATED)
    Since Large Language Models or LLMs have demonstrated high-quality performance on many complex language tasks, there is a great interest in bringing these LLMs to mobile devices for faster responses and better privacy protection. However, the size of LLMs (i.e., billions of parameters) requires highly effective compression to fit into storage-limited devices. Among many compression techniques, weight-clustering, a form of non-linear quantization, is one of the leading candidates for LLM compression, and supported by modern smartphones. Yet, its training overhead is prohibitively significant for LLM fine-tuning. Especially, Differentiable KMeans Clustering, or DKM, has shown the state-of-the-art trade-off between compression ratio and accuracy regression, but its large memory complexity makes it nearly impossible to apply to train-time LLM compression. In this paper, we propose a memory-efficient DKM implementation, eDKM powered by novel techniques to reduce the memory footprint of DKM by orders of magnitudes. For a given tensor to be saved on CPU for the backward pass of DKM, we compressed the tensor by applying uniquification and sharding after checking if there is no duplicated tensor previously copied to CPU. Our experimental results demonstrate that \prjname can fine-tune and compress a pretrained LLaMA 7B model from 12.6 GB to 2.5 GB (3bit/weight) with the Alpaca dataset by reducing the train-time memory footprint of a decoder layer by 130$\times$, while delivering good accuracy on broader LLM benchmarks (i.e., 77.7% for PIQA, 66.1% for Winograde, and so on).  ( 3 min )
    Machine Learning-Assisted Discovery of Novel Reactor Designs. (arXiv:2308.08841v2 [cs.CE] UPDATED)
    Additive manufacturing has enabled the fabrication of advanced reactor geometries, permitting larger, more complex design spaces. Identifying promising configurations within such spaces presents a significant challenge for current approaches. Furthermore, existing parameterisations of reactor geometries are low-dimensional with expensive optimisation limiting more complex solutions. To address this challenge, we establish a machine learning-assisted approach for the design of the next-generation of chemical reactors, combining the application of high-dimensional parameterisations, computational fluid dynamics, and multi-fidelity Bayesian optimisation. We associate the development of mixing-enhancing vortical flow structures in novel coiled reactors with performance, and use our approach to identify key characteristics of optimal designs. By appealing to fluid mechanical principles, we rationalise the selection of novel design features that lead to experimental performance improvements of ~60% over conventional designs. Our results demonstrate that coupling advanced manufacturing techniques with `augmented-intelligence' approaches can lead to superior design performance and, consequently, emissions-reduction and sustainability.  ( 2 min )
    Transferable Graph Neural Fingerprint Models for Quick Response to Future Bio-Threats. (arXiv:2308.01921v2 [q-bio.BM] UPDATED)
    Fast screening of drug molecules based on the ligand binding affinity is an important step in the drug discovery pipeline. Graph neural fingerprint is a promising method for developing molecular docking surrogates with high throughput and great fidelity. In this study, we built a COVID-19 drug docking dataset of about 300,000 drug candidates on 23 coronavirus protein targets. With this dataset, we trained graph neural fingerprint docking models for high-throughput virtual COVID-19 drug screening. The graph neural fingerprint models yield high prediction accuracy on docking scores with the mean squared error lower than $0.21$ kcal/mol for most of the docking targets, showing significant improvement over conventional circular fingerprint methods. To make the neural fingerprints transferable for unknown targets, we also propose a transferable graph neural fingerprint method trained on multiple targets. With comparable accuracy to target-specific graph neural fingerprint models, the transferable model exhibits superb training and data efficiency. We highlight that the impact of this study extends beyond COVID-19 dataset, as our approach for fast virtual ligand screening can be easily adapted and integrated into a general machine learning-accelerated pipeline to battle future bio-threats.  ( 3 min )
    Semantic Adversarial Attacks via Diffusion Models. (arXiv:2309.07398v1 [cs.CV])
    Traditional adversarial attacks concentrate on manipulating clean examples in the pixel space by adding adversarial perturbations. By contrast, semantic adversarial attacks focus on changing semantic attributes of clean examples, such as color, context, and features, which are more feasible in the real world. In this paper, we propose a framework to quickly generate a semantic adversarial attack by leveraging recent diffusion models since semantic information is included in the latent space of well-trained diffusion models. Then there are two variants of this framework: 1) the Semantic Transformation (ST) approach fine-tunes the latent space of the generated image and/or the diffusion model itself; 2) the Latent Masking (LM) approach masks the latent space with another target image and local backpropagation-based interpretation methods. Additionally, the ST approach can be applied in either white-box or black-box settings. Extensive experiments are conducted on CelebA-HQ and AFHQ datasets, and our framework demonstrates great fidelity, generalizability, and transferability compared to other baselines. Our approaches achieve approximately 100% attack success rate in multiple settings with the best FID as 36.61. Code is available at https://github.com/steven202/semantic_adv_via_dm.  ( 2 min )
    Meta-Learning Regrasping Strategies for Physical-Agnostic Objects. (arXiv:2205.11110v2 [cs.RO] UPDATED)
    Grasping inhomogeneous objects in real-world applications remains a challenging task due to the unknown physical properties such as mass distribution and coefficient of friction. In this study, we propose a meta-learning algorithm called ConDex, which incorporates Conditional Neural Processes (CNP) with DexNet-2.0 to autonomously discern the underlying physical properties of objects using depth images. ConDex efficiently acquires physical embeddings from limited trials, enabling precise grasping point estimation. Furthermore, ConDex is capable of updating the predicted grasping quality iteratively from new trials in an online fashion. To the best of our knowledge, we are the first who generate two object datasets focusing on inhomogeneous physical properties with varying mass distributions and friction coefficients. Extensive evaluations in simulation demonstrate ConDex's superior performance over DexNet-2.0 and existing meta-learning-based grasping pipelines. Furthermore, ConDex shows robust generalization to previously unseen real-world objects despite training solely in the simulation. The synthetic and real-world datasets will be published as well.  ( 2 min )
    An Adaptive Federated Relevance Framework for Spatial Temporal Graph Learning. (arXiv:2206.03420v3 [cs.LG] UPDATED)
    Spatial-temporal data contains rich information and has been widely studied in recent years due to the rapid development of relevant applications in many fields. For instance, medical institutions often use electrodes attached to different parts of a patient to analyse the electorencephal data rich with spatial and temporal features for health assessment and disease diagnosis. Existing research has mainly used deep learning techniques such as convolutional neural network (CNN) or recurrent neural network (RNN) to extract hidden spatial-temporal features. Yet, it is challenging to incorporate both inter-dependencies spatial information and dynamic temporal changes simultaneously. In reality, for a model that leverages these spatial-temporal features to fulfil complex prediction tasks, it often requires a colossal amount of training data in order to obtain satisfactory model performance. Considering the above-mentioned challenges, we propose an adaptive federated relevance framework, namely FedRel, for spatial-temporal graph learning in this paper. After transforming the raw spatial-temporal data into high quality features, the core Dynamic Inter-Intra Graph (DIIG) module in the framework is able to use these features to generate the spatial-temporal graphs capable of capturing the hidden topological and long-term temporal correlation information in these graphs. To improve the model generalization ability and performance while preserving the local data privacy, we also design a relevance-driven federated learning module in our framework to leverage diverse data distributions from different participants with attentive aggregations of their models.
    Self-optimizing Feature Generation via Categorical Hashing Representation and Hierarchical Reinforcement Crossing. (arXiv:2309.04612v2 [cs.LG] UPDATED)
    Feature generation aims to generate new and meaningful features to create a discriminative representation space.A generated feature is meaningful when the generated feature is from a feature pair with inherent feature interaction. In the real world, experienced data scientists can identify potentially useful feature-feature interactions, and generate meaningful dimensions from an exponentially large search space, in an optimal crossing form over an optimal generation path. But, machines have limited human-like abilities.We generalize such learning tasks as self-optimizing feature generation. Self-optimizing feature generation imposes several under-addressed challenges on existing systems: meaningful, robust, and efficient generation. To tackle these challenges, we propose a principled and generic representation-crossing framework to solve self-optimizing feature generation.To achieve hashing representation, we propose a three-step approach: feature discretization, feature hashing, and descriptive summarization. To achieve reinforcement crossing, we develop a hierarchical reinforcement feature crossing approach.We present extensive experimental results to demonstrate the effectiveness and efficiency of the proposed method. The code is available at https://github.com/yingwangyang/HRC_feature_cross.git.  ( 2 min )
    What Matters to Enhance Traffic Rule Compliance of Imitation Learning for Automated Driving. (arXiv:2309.07808v1 [cs.CV])
    More research attention has recently been given to end-to-end autonomous driving technologies where the entire driving pipeline is replaced with a single neural network because of its simpler structure and faster inference time. Despite this appealing approach largely reducing the components in driving pipeline, its simplicity also leads to interpretability problems and safety issues arXiv:2003.06404. The trained policy is not always compliant with the traffic rules and it is also hard to discover the reason for the misbehavior because of the lack of intermediate outputs. Meanwhile, Sensors are also critical to autonomous driving's security and feasibility to perceive the surrounding environment under complex driving scenarios. In this paper, we proposed P-CSG, a novel penalty-based imitation learning approach with cross semantics generation sensor fusion technologies to increase the overall performance of End-to-End Autonomous Driving. We conducted an assessment of our model's performance using the Town 05 Long benchmark, achieving an impressive driving score improvement of over 15%. Furthermore, we conducted robustness evaluations against adversarial attacks like FGSM and Dot attacks, revealing a substantial increase in robustness compared to baseline models.More detailed information, such as code-based resources, ablation studies and videos can be found at https://hk-zh.github.io/p-csg-plus.  ( 2 min )
    Directed Scattering for Knowledge Graph-based Cellular Signaling Analysis. (arXiv:2309.07813v1 [cs.LG])
    Directed graphs are a natural model for many phenomena, in particular scientific knowledge graphs such as molecular interaction or chemical reaction networks that define cellular signaling relationships. In these situations, source nodes typically have distinct biophysical properties from sinks. Due to their ordered and unidirectional relationships, many such networks also have hierarchical and multiscale structure. However, the majority of methods performing node- and edge-level tasks in machine learning do not take these properties into account, and thus have not been leveraged effectively for scientific tasks such as cellular signaling network inference. We propose a new framework called Directed Scattering Autoencoder (DSAE) which uses a directed version of a geometric scattering transform, combined with the non-linear dimensionality reduction properties of an autoencoder and the geometric properties of the hyperbolic space to learn latent hierarchies. We show this method outperforms numerous others on tasks such as embedding directed graphs and learning cellular signaling networks.  ( 2 min )
    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining. (arXiv:2309.07332v1 [cs.LG])
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.  ( 3 min )
    Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary tasks. (arXiv:2309.07794v1 [cs.CL])
    Effectively leveraging multimodal information from social media posts is essential to various downstream tasks such as sentiment analysis, sarcasm detection and hate speech classification. However, combining text and image information is challenging because of the idiosyncratic cross-modal semantics with hidden or complementary information present in matching image-text pairs. In this work, we aim to directly model this by proposing the use of two auxiliary losses jointly with the main task when fine-tuning any pre-trained multimodal model. Image-Text Contrastive (ITC) brings image-text representations of a post closer together and separates them from different posts, capturing underlying dependencies. Image-Text Matching (ITM) facilitates the understanding of semantic correspondence between images and text by penalizing unrelated pairs. We combine these objectives with five multimodal models, demonstrating consistent improvements across four popular social media datasets. Furthermore, through detailed analysis, we shed light on the specific scenarios and cases where each auxiliary task proves to be most effective.  ( 2 min )
    Benchmarking Jetson Edge Devices with an End-to-end Video-based Anomaly Detection System. (arXiv:2307.16834v3 [cs.CV] UPDATED)
    Innovative enhancement in embedded system platforms, specifically hardware accelerations, significantly influence the application of deep learning in real-world scenarios. These innovations translate human labor efforts into automated intelligent systems employed in various areas such as autonomous driving, robotics, Internet-of-Things (IoT), and numerous other impactful applications. NVIDIA's Jetson platform is one of the pioneers in offering optimal performance regarding energy efficiency and throughput in the execution of deep learning algorithms. Previously, most benchmarking analysis was based on 2D images with a single deep learning model for each comparison result. In this paper, we implement an end-to-end video-based crime-scene anomaly detection system inputting from surveillance videos and the system is deployed and completely operates on multiple Jetson edge devices (Nano, AGX Xavier, Orin Nano). The comparison analysis includes the integration of Torch-TensorRT as a software developer kit from NVIDIA for the model performance optimisation. The system is built based on the PySlowfast open-source project from Facebook as the coding template. The end-to-end system process comprises the videos from camera, data preprocessing pipeline, feature extractor and the anomaly detection. We provide the experience of an AI-based system deployment on various Jetson Edge devices with Docker technology. Regarding anomaly detectors, a weakly supervised video-based deep learning model called Robust Temporal Feature Magnitude Learning (RTFM) is applied in the system. The approach system reaches 47.56 frames per second (FPS) inference speed on a Jetson edge device with only 3.11 GB RAM usage total. We also discover the promising Jetson device that the AI system achieves 15% better performance than the previous version of Jetson devices while consuming 50% less energy power.  ( 3 min )
    Multi-step prediction of chlorophyll concentration based on Adaptive Graph-Temporal Convolutional Network with Series Decomposition. (arXiv:2309.07187v1 [cs.LG])
    Chlorophyll concentration can well reflect the nutritional status and algal blooms of water bodies, and is an important indicator for evaluating water quality. The prediction of chlorophyll concentration change trend is of great significance to environmental protection and aquaculture. However, there is a complex and indistinguishable nonlinear relationship between many factors affecting chlorophyll concentration. In order to effectively mine the nonlinear features contained in the data. This paper proposes a time-series decomposition adaptive graph-time convolutional network ( AGTCNSD ) prediction model. Firstly, the original sequence is decomposed into trend component and periodic component by moving average method. Secondly, based on the graph convolutional neural network, the water quality parameter data is modeled, and a parameter embedding matrix is defined. The idea of matrix decomposition is used to assign weight parameters to each node. The adaptive graph convolution learns the relationship between different water quality parameters, updates the state information of each parameter, and improves the learning ability of the update relationship between nodes. Finally, time dependence is captured by time convolution to achieve multi-step prediction of chlorophyll concentration. The validity of the model is verified by the water quality data of the coastal city Beihai. The results show that the prediction effect of this method is better than other methods. It can be used as a scientific resource for environmental management decision-making.  ( 3 min )
    Reasoning with Language Model Prompting: A Survey. (arXiv:2212.09597v7 [cs.CL] UPDATED)
    Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc. This paper provides a comprehensive survey of cutting-edge research on reasoning with language model prompting. We introduce research works with comparisons and summaries and provide systematic resources to help beginners. We also discuss the potential reasons for emerging such reasoning abilities and highlight future research directions. Resources are available at https://github.com/zjunlp/Prompt4ReasoningPapers (updated periodically).  ( 2 min )
    Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision. (arXiv:2309.07601v1 [cs.CL])
    Credibility signals represent a wide range of heuristics that are typically used by journalists and fact-checkers to assess the veracity of online content. Automating the task of credibility signal extraction, however, is very challenging as it requires high-accuracy signal-specific extractors to be trained, while there are currently no sufficiently large datasets annotated with all credibility signals. This paper investigates whether large language models (LLMs) can be prompted effectively with a set of 18 credibility signals to produce weak labels for each signal. We then aggregate these potentially noisy labels using weak supervision in order to predict content veracity. We demonstrate that our approach, which combines zero-shot LLM credibility signal labeling and weak supervision, outperforms state-of-the-art classifiers on two misinformation datasets without using any ground-truth labels for training. We also analyse the contribution of the individual credibility signals towards predicting content veracity, which provides new valuable insights into their role in misinformation detection.  ( 2 min )
    Text Classification of Cancer Clinical Trial Eligibility Criteria. (arXiv:2309.07812v1 [cs.CL])
    Automatic identification of clinical trials for which a patient is eligible is complicated by the fact that trial eligibility is stated in natural language. A potential solution to this problem is to employ text classification methods for common types of eligibility criteria. In this study, we focus on seven common exclusion criteria in cancer trials: prior malignancy, human immunodeficiency virus, hepatitis B, hepatitis C, psychiatric illness, drug/substance abuse, and autoimmune illness. Our dataset consists of 764 phase III cancer trials with these exclusions annotated at the trial level. We experiment with common transformer models as well as a new pre-trained clinical trial BERT model. Our results demonstrate the feasibility of automatically classifying common exclusion criteria. Additionally, we demonstrate the value of a pre-trained language model specifically for clinical trials, which yields the highest average performance across all criteria.  ( 2 min )
    Understanding Vector-Valued Neural Networks and Their Relationship with Real and Hypercomplex-Valued Neural Networks. (arXiv:2309.07716v1 [cs.LG])
    Despite the many successful applications of deep learning models for multidimensional signal and image processing, most traditional neural networks process data represented by (multidimensional) arrays of real numbers. The intercorrelation between feature channels is usually expected to be learned from the training data, requiring numerous parameters and careful training. In contrast, vector-valued neural networks are conceived to process arrays of vectors and naturally consider the intercorrelation between feature channels. Consequently, they usually have fewer parameters and often undergo more robust training than traditional neural networks. This paper aims to present a broad framework for vector-valued neural networks, referred to as V-nets. In this context, hypercomplex-valued neural networks are regarded as vector-valued models with additional algebraic properties. Furthermore, this paper explains the relationship between vector-valued and traditional neural networks. Precisely, a vector-valued neural network can be obtained by placing restrictions on a real-valued model to consider the intercorrelation between feature channels. Finally, we show how V-nets, including hypercomplex-valued neural networks, can be implemented in current deep-learning libraries as real-valued networks.  ( 2 min )
    Entropy-based machine learning model for diagnosis and monitoring of Parkinson's Disease in smart IoT environment. (arXiv:2309.07134v1 [eess.SP])
    The study presents the concept of a computationally efficient machine learning (ML) model for diagnosing and monitoring Parkinson's disease (PD) in an Internet of Things (IoT) environment using rest-state EEG signals (rs-EEG). We computed different types of entropy from EEG signals and found that Fuzzy Entropy performed the best in diagnosing and monitoring PD using rs-EEG. We also investigated different combinations of signal frequency ranges and EEG channels to accurately diagnose PD. Finally, with a fewer number of features (11 features), we achieved a maximum classification accuracy (ARKF) of ~99.9%. The most prominent frequency range of EEG signals has been identified, and we have found that high classification accuracy depends on low-frequency signal components (0-4 Hz). Moreover, the most informative signals were mainly received from the right hemisphere of the head (F8, P8, T8, FC6). Furthermore, we assessed the accuracy of the diagnosis of PD using three different lengths of EEG data (150-1000 samples). Because the computational complexity is reduced by reducing the input data. As a result, we have achieved a maximum mean accuracy of 99.9% for a sample length (LEEG) of 1000 (~7.8 seconds), 98.2% with a LEEG of 800 (~6.2 seconds), and 79.3% for LEEG = 150 (~1.2 seconds). By reducing the number of features and segment lengths, the computational cost of classification can be reduced. Lower-performance smart ML sensors can be used in IoT environments for enhances human resilience to PD.  ( 3 min )
    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v4 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.  ( 2 min )
    deepFDEnet: A Novel Neural Network Architecture for Solving Fractional Differential Equations. (arXiv:2309.07684v1 [cs.LG])
    The primary goal of this research is to propose a novel architecture for a deep neural network that can solve fractional differential equations accurately. A Gaussian integration rule and a $L_1$ discretization technique are used in the proposed design. In each equation, a deep neural network is used to approximate the unknown function. Three forms of fractional differential equations have been examined to highlight the method's versatility: a fractional ordinary differential equation, a fractional order integrodifferential equation, and a fractional order partial differential equation. The results show that the proposed architecture solves different forms of fractional differential equations with excellent precision.  ( 2 min )
    Interpretable Weighted Siamese Network to Predict the Time to Onset of Alzheimer's Disease from MRI Images. (arXiv:2304.07097v2 [eess.IV] UPDATED)
    Alzheimer's Disease (AD) is a progressive disease preceded by Mild Cognitive Impairment (MCI). Early detection of AD is crucial for making treatment decisions. However, most of the literature on computer-assisted detection of AD focuses on classifying brain images into one of three major categories: healthy, MCI, and AD; or categorizing MCI patients into (1) progressive: those who progress from MCI to AD at a future examination time, and (2) stable: those who stay as MCI and never progress to AD. This misses the opportunity to accurately identify the trajectory of progressive MCI patients. In this paper, we revisit the brain image classification task for AD identification and re-frame it as an ordinal classification task to predict how close a patient is to the severe AD stage. To this end, we select progressive MCI patients from the Alzheimer's Disease Neuroimaging Initiative (ADNI) dataset and construct an ordinal dataset with a prediction target that indicates the time to progression to AD. We train a Siamese network model to predict the time to onset of AD based on MRI brain images. We also propose a Weighted variety of Siamese network and compare its performance to a baseline model. Our evaluations show that incorporating a weighting factor to Siamese networks brings considerable performance gain at predicting how close input brain MRI images are to progressing to AD. Moreover, we complement our results with an interpretation of the learned embedding space of the Siamese networks using a model explainability technique.  ( 3 min )
    Reward is not Necessary: How to Create a Compositional Self-Preserving Agent for Life-Long Learning. (arXiv:2211.10851v3 [cs.AI] UPDATED)
    Reinforcement Learning views the maximization of rewards and avoidance of punishments as central to explaining goal-directed behavior. However, over a life, organisms will need to learn about many different aspects of the world's structure: the states of the world and state-vector transition dynamics. The number of combinations of states grows exponentially as an agent incorporates new knowledge, and there is no obvious weighted combination of pre-existing rewards or costs defined for a given combination of states, as such a weighting would need to encode information about good and bad combinations prior to an agent's experience in the world. Therefore, we must develop more naturalistic accounts of behavior and motivation in large state-spaces. We show that it is possible to use only the intrinsic motivation metric of empowerment, which measures the agent's capacity to realize many possible futures under a transition operator. We propose to scale empowerment to hierarchical state-spaces by using Operator Bellman Equations. These equations produce state-time feasibility functions, which are compositional hierarchical state-time transition operators that map an initial state and time when an agent begins a policy to the final states and times of completing a goal. Because these functions are hierarchical operators we can define hierarchical empowerment measures on them. An agent can then optimize plans to distant states and times to maximize its hierarchical empowerment-gain, allowing it to discover goals that bring about a more favorable coupling of its internal structure (physiological states) to its external environment (world structure & spatial state). Life-long agents could therefore be primarily animated by principles of compositionality and empowerment, exhibiting self-concern for the growth & maintenance of their own structural integrity without recourse to reward-maximization.  ( 3 min )
    Masked Transformer for Electrocardiogram Classification. (arXiv:2309.07136v1 [eess.SP])
    Electrocardiogram (ECG) is one of the most important diagnostic tools in clinical applications. With the advent of advanced algorithms, various deep learning models have been adopted for ECG tasks. However, the potential of Transformers for ECG data is not yet realized, despite their widespread success in computer vision and natural language processing. In this work, we present a useful masked Transformer method for ECG classification referred to as MTECG, which expands the application of masked autoencoders to ECG time series. We construct a dataset comprising 220,251 ECG recordings with a broad range of diagnoses annoated by medical experts to explore the properties of MTECG. Under the proposed training strategies, a lightweight model with 5.7M parameters performs stably well on a broad range of masking ratios (5%-75%). The ablation studies highlight the importance of fluctuated reconstruction targets, training schedule length, layer-wise LR decay and DropPath rate. The experiments on both private and public ECG datasets demonstrate that MTECG-T significantly outperforms the recent state-of-the-art algorithms in ECG classification.  ( 2 min )
    Structure-Preserving Transformers for Sequences of SPD Matrices. (arXiv:2309.07579v1 [cs.LG])
    In recent years, Transformer-based auto-attention mechanisms have been successfully applied to the analysis of a variety of context-reliant data types, from texts to images and beyond, including data from non-Euclidean geometries. In this paper, we present such a mechanism, designed to classify sequences of Symmetric Positive Definite matrices while preserving their Riemannian geometry throughout the analysis. We apply our method to automatic sleep staging on timeseries of EEG-derived covariance matrices from a standard dataset, obtaining high levels of stage-wise performance.
    A Novel Local-Global Feature Fusion Framework for Body-weight Exercise Recognition with Pressure Mapping Sensors. (arXiv:2309.07888v1 [cs.CV])
    We present a novel local-global feature fusion framework for body-weight exercise recognition with floor-based dynamic pressure maps. One step further from the existing studies using deep neural networks mainly focusing on global feature extraction, the proposed framework aims to combine local and global features using image processing techniques and the YOLO object detection to localize pressure profiles from different body parts and consider physical constraints. The proposed local feature extraction method generates two sets of high-level local features consisting of cropped pressure mapping and numerical features such as angular orientation, location on the mat, and pressure area. In addition, we adopt a knowledge distillation for regularization to preserve the knowledge of the global feature extraction and improve the performance of the exercise recognition. Our experimental results demonstrate a notable 11 percent improvement in F1 score for exercise recognition while preserving label-specific features.  ( 2 min )
    Deep reinforced learning heuristic tested on spin-glass ground states: The larger picture. (arXiv:2302.10848v2 [cond-mat.dis-nn] UPDATED)
    In Changjun Fan et al. [Nature Communications https://doi.org/10.1038/s41467-023-36363-w (2023)], the authors present a deep reinforced learning approach to augment combinatorial optimization heuristics. In particular, they present results for several spin glass ground state problems, for which instances on non-planar networks are generally NP-hard, in comparison with several Monte Carlo based methods, such as simulated annealing (SA) or parallel tempering (PT). Indeed, those results demonstrate that the reinforced learning improves the results over those obtained with SA or PT, or at least allows for reduced runtimes for the heuristics before results of comparable quality have been obtained relative to those other methods. To facilitate the conclusion that their method is ''superior'', the authors pursue two basic strategies: (1) A commercial GUROBI solver is called on to procure a sample of exact ground states as a testbed to compare with, and (2) a head-to-head comparison between the heuristics is given for a sample of larger instances where exact ground states are hard to ascertain. Here, we put these studies into a larger context, showing that the claimed superiority is at best marginal for smaller samples and becomes essentially irrelevant with respect to any sensible approximation of true ground states in the larger samples. For example, this method becomes irrelevant as a means to determine stiffness exponents $\theta$ in $d>2$, as mentioned by the authors, where the problem is not only NP-hard but requires the subtraction of two almost equal ground-state energies and systemic errors in each of $\approx 1\%$ found here are unacceptable. This larger picture on the method arises from a straightforward finite-size corrections study over the spin glass ensembles the authors employ, using data that has been available for decades.
    DoRA: Domain-Based Self-Supervised Learning Framework for Low-Resource Real Estate Appraisal. (arXiv:2309.00855v3 [cs.LG] UPDATED)
    The marketplace system connecting demands and supplies has been explored to develop unbiased decision-making in valuing properties. Real estate appraisal serves as one of the high-cost property valuation tasks for financial institutions since it requires domain experts to appraise the estimation based on the corresponding knowledge and the judgment of the market. Existing automated valuation models reducing the subjectivity of domain experts require a large number of transactions for effective evaluation, which is predominantly limited to not only the labeling efforts of transactions but also the generalizability of new developing and rural areas. To learn representations from unlabeled real estate sets, existing self-supervised learning (SSL) for tabular data neglects various important features, and fails to incorporate domain knowledge. In this paper, we propose DoRA, a Domain-based self-supervised learning framework for low-resource Real estate Appraisal. DoRA is pre-trained with an intra-sample geographic prediction as the pretext task based on the metadata of the real estate for equipping the real estate representations with prior domain knowledge. Furthermore, inter-sample contrastive learning is employed to generalize the representations to be robust for limited transactions of downstream tasks. Our benchmark results on three property types of real-world transactions show that DoRA significantly outperforms the SSL baselines for tabular data, the graph-based methods, and the supervised approaches in the few-shot scenarios by at least 7.6% for MAPE, 11.59% for MAE, and 3.34% for HR10%. We expect DoRA to be useful to other financial practitioners with similar marketplace applications who need general models for properties that are newly built and have limited records. The source code is available at https://github.com/wwweiwei/DoRA.
    Evaluation of Parameter-based Attacks against Embedded Neural Networks with Laser Injection. (arXiv:2304.12876v2 [cs.CR] UPDATED)
    Upcoming certification actions related to the security of machine learning (ML) based systems raise major evaluation challenges that are amplified by the large-scale deployment of models in many hardware platforms. Until recently, most of research works focused on API-based attacks that consider a ML model as a pure algorithmic abstraction. However, new implementation-based threats have been revealed, emphasizing the urgency to propose both practical and simulation-based methods to properly evaluate the robustness of models. A major concern is parameter-based attacks (such as the Bit-Flip Attack, BFA) that highlight the lack of robustness of typical deep neural network models when confronted by accurate and optimal alterations of their internal parameters stored in memory. Setting in a security testing purpose, this work practically reports, for the first time, a successful variant of the BFA on a 32-bit Cortex-M microcontroller using laser fault injection. It is a standard fault injection means for security evaluation, that enables to inject spatially and temporally accurate faults. To avoid unrealistic brute-force strategies, we show how simulations help selecting the most sensitive set of bits from the parameters taking into account the laser fault model.
    Learning nonparametric DAGs with incremental information via high-order HSIC. (arXiv:2308.05969v2 [cs.LG] UPDATED)
    Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
    TensorFlow Chaotic Prediction and Blow Up. (arXiv:2309.07450v1 [cs.LG])
    Predicting the dynamics of chaotic systems is one of the most challenging tasks for neural networks, and machine learning in general. Here we aim to predict the spatiotemporal chaotic dynamics of a high-dimensional non-linear system. In our attempt we use the TensorFlow library, representing the state of the art for deep neural networks training and prediction. While our results are encouraging, and show that the dynamics of the considered system can be predicted for short time, we also indirectly discovered an unexpected and undesirable behavior of the TensorFlow library. More specifically, the longer term prediction of the system's chaotic behavior quickly deteriorates and blows up due to the nondeterministic behavior of the TensorFlow library. Here we provide numerical evidence of the short time prediction ability, and of the longer term predictability blow up.
    Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense. (arXiv:2309.06724v2 [cs.CV] UPDATED)
    We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.
    Autotuning Apache TVM-based Scientific Applications Using Bayesian Optimization. (arXiv:2309.07235v1 [cs.LG])
    Apache TVM (Tensor Virtual Machine), an open source machine learning compiler framework designed to optimize computations across various hardware platforms, provides an opportunity to improve the performance of dense matrix factorizations such as LU (Lower Upper) decomposition and Cholesky decomposition on GPUs and AI (Artificial Intelligence) accelerators. In this paper, we propose a new TVM autotuning framework using Bayesian Optimization and use the TVM tensor expression language to implement linear algebra kernels such as LU, Cholesky, and 3mm. We use these scientific computation kernels to evaluate the effectiveness of our methods on a GPU cluster, called Swing, at Argonne National Laboratory. We compare the proposed autotuning framework with the TVM autotuning framework AutoTVM with four tuners and find that our framework outperforms AutoTVM in most cases.
    Interpretability is in the Mind of the Beholder: A Causal Framework for Human-interpretable Representation Learning. (arXiv:2309.07742v1 [cs.LG])
    Focus in Explainable AI is shifting from explanations defined in terms of low-level elements, such as input features, to explanations encoded in terms of interpretable concepts learned from data. How to reliably acquire such concepts is, however, still fundamentally unclear. An agreed-upon notion of concept interpretability is missing, with the result that concepts used by both post-hoc explainers and concept-based neural networks are acquired through a variety of mutually incompatible strategies. Critically, most of these neglect the human side of the problem: a representation is understandable only insofar as it can be understood by the human at the receiving end. The key challenge in Human-interpretable Representation Learning (HRL) is how to model and operationalize this human element. In this work, we propose a mathematical framework for acquiring interpretable representations suitable for both post-hoc explainers and concept-based neural networks. Our formalization of HRL builds on recent advances in causal representation learning and explicitly models a human stakeholder as an external observer. This allows us to derive a principled notion of alignment between the machine representation and the vocabulary of concepts understood by the human. In doing so, we link alignment and interpretability through a simple and intuitive name transfer game, and clarify the relationship between alignment and a well-known property of representations, namely disentanglment. We also show that alignment is linked to the issue of undesirable correlations among concepts, also known as concept leakage, and to content-style separation, all through a general information-theoretic reformulation of these properties. Our conceptualization aims to bridge the gap between the human and algorithmic sides of interpretability and establish a stepping stone for new research on human-interpretable representations.
    Learning to Warm-Start Fixed-Point Optimization Algorithms. (arXiv:2309.07835v1 [math.OC])
    We introduce a machine-learning framework to warm-start fixed-point optimization algorithms. Our architecture consists of a neural network mapping problem parameters to warm starts, followed by a predefined number of fixed-point iterations. We propose two loss functions designed to either minimize the fixed-point residual or the distance to a ground truth solution. In this way, the neural network predicts warm starts with the end-to-end goal of minimizing the downstream loss. An important feature of our architecture is its flexibility, in that it can predict a warm start for fixed-point algorithms run for any number of steps, without being limited to the number of steps it has been trained on. We provide PAC-Bayes generalization bounds on unseen data for common classes of fixed-point operators: contractive, linearly convergent, and averaged. Applying this framework to well-known applications in control, statistics, and signal processing, we observe a significant reduction in the number of iterations and solution time required to solve these problems, through learned warm starts.
    On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. (arXiv:1906.00331v9 [cs.LG] UPDATED)
    We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.
    Causal Entropy and Information Gain for Measuring Causal Control. (arXiv:2309.07703v1 [cs.LG])
    Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.
    Improving physics-informed DeepONets with hard constraints. (arXiv:2309.07899v1 [cs.LG])
    Current physics-informed (standard or operator) neural networks still rely on accurately learning the initial conditions of the system they are solving. In contrast, standard numerical methods evolve such initial conditions without needing to learn these. In this study, we propose to improve current physics-informed deep learning strategies such that initial conditions do not need to be learned and are represented exactly in the predicted solution. Moreover, this method guarantees that when a DeepONet is applied multiple times to time step a solution, the resulting function is continuous.
    Identifying the Group-Theoretic Structure of Machine-Learned Symmetries. (arXiv:2309.07860v1 [hep-ph])
    Deep learning was recently successfully used in deriving symmetry transformations that preserve important physics quantities. Being completely agnostic, these techniques postpone the identification of the discovered symmetries to a later stage. In this letter we propose methods for examining and identifying the group-theoretic structure of such machine-learned symmetries. We design loss functions which probe the subalgebra structure either during the deep learning stage of symmetry discovery or in a subsequent post-processing stage. We illustrate the new methods with examples from the U(n) Lie group family, obtaining the respective subalgebra decompositions. As an application to particle physics, we demonstrate the identification of the residual symmetries after the spontaneous breaking of non-Abelian gauge symmetries like SU(3) and SU(5) which are commonly used in model building.
    A DenseNet-based method for decoding auditory spatial attention with EEG. (arXiv:2309.07690v1 [eess.SP])
    Auditory spatial attention detection (ASAD) aims to decode the attended spatial location with EEG in a multiple-speaker setting. ASAD methods are inspired by the brain lateralization of cortical neural responses during the processing of auditory spatial attention, and show promising performance for the task of auditory attention decoding (AAD) with neural recordings. In the previous ASAD methods, the spatial distribution of EEG electrodes is not fully exploited, which may limit the performance of these methods. In the present work, by transforming the original EEG channels into a two-dimensional (2D) spatial topological map, the EEG data is transformed into a three-dimensional (3D) arrangement containing spatial-temporal information. And then a 3D deep convolutional neural network (DenseNet-3D) is used to extract temporal and spatial features of the neural representation for the attended locations. The results show that the proposed method achieves higher decoding accuracy than the state-of-the-art (SOTA) method (94.4% compared to XANet's 90.6%) with 1-second decision window for the widely used KULeuven (KUL) dataset, and the code to implement our work is available on Github: https://github.com/xuxiran/ASAD_DenseNet
    Pareto Adversarial Robustness: Balancing Spatial Robustness and Sensitivity-based Robustness. (arXiv:2111.01996v2 [cs.LG] UPDATED)
    Adversarial robustness, which primarily comprises sensitivity-based robustness and spatial robustness, plays an integral part in achieving robust generalization. In this paper, we endeavor to design strategies to achieve universal adversarial robustness. To achieve this, we first investigate the relatively less-explored realm of spatial robustness. Then, we integrate the existing spatial robustness methods by incorporating both local and global spatial vulnerability into a unified spatial attack and adversarial training approach. Furthermore, we present a comprehensive relationship between natural accuracy, sensitivity-based robustness, and spatial robustness, supported by strong evidence from the perspective of robust representation. Crucially, to reconcile the interplay between the mutual impacts of various robustness components into one unified framework, we incorporate the \textit{Pareto criterion} into the adversarial robustness analysis, yielding a novel strategy called Pareto Adversarial Training for achieving universal robustness. The resulting Pareto front, which delineates the set of optimal solutions, provides an optimal balance between natural accuracy and various adversarial robustness. This sheds light on solutions for achieving universal robustness in the future. To the best of our knowledge, we are the first to consider universal adversarial robustness via multi-objective optimization.
    Goal Space Abstraction in Hierarchical Reinforcement Learning via Set-Based Reachability Analysis. (arXiv:2309.07675v1 [cs.LG])
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this paper, we propose a developmental mechanism for goal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We introduce a Feudal HRL algorithm that concurrently learns both the goal representation and a hierarchical policy. The algorithm uses symbolic reachability analysis for neural networks to approximate the transition relation among sets of states and to refine the goal representation. We evaluate our approach on complex navigation tasks, showing the learned representation is interpretable, transferrable and results in data efficient learning.
    Semi-supervised Domain Adaptation on Graphs with Contrastive Learning and Minimax Entropy. (arXiv:2309.07402v1 [cs.LG])
    Label scarcity in a graph is frequently encountered in real-world applications due to the high cost of data labeling. To this end, semi-supervised domain adaptation (SSDA) on graphs aims to leverage the knowledge of a labeled source graph to aid in node classification on a target graph with limited labels. SSDA tasks need to overcome the domain gap between the source and target graphs. However, to date, this challenging research problem has yet to be formally considered by the existing approaches designed for cross-graph node classification. To tackle the SSDA problem on graphs, a novel method called SemiGCL is proposed, which benefits from graph contrastive learning and minimax entropy training. SemiGCL generates informative node representations by contrasting the representations learned from a graph's local and global views. Additionally, SemiGCL is adversarially optimized with the entropy loss of unlabeled target nodes to reduce domain divergence. Experimental results on benchmark datasets demonstrate that SemiGCL outperforms the state-of-the-art baselines on the SSDA tasks.
    Beta Diffusion. (arXiv:2309.07867v1 [cs.LG])
    We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.
    Conformal Regression in Calorie Prediction for Team Jumbo-Visma. (arXiv:2304.03778v3 [cs.LG] UPDATED)
    UCI WorldTour races, the premier men's elite road cycling tour, are grueling events that put physical fitness and endurance of riders to the test. The coaches of Team Jumbo-Visma have long been responsible for predicting the energy needs of each rider of the Dutch team for every race on the calendar. Those must be estimated to ensure riders have the energy and resources necessary to maintain a high level of performance throughout a race. This task, however, is both time-consuming and challenging, as it requires precise estimates of race speed and power output. Traditionally, the approach to predicting energy needs has relied on judgement and experience of coaches, but this method has its limitations and often leads to inaccurate predictions. In this paper, we propose a new, more effective approach to predicting energy needs for cycling races. By predicting the speed and power with regression models, we provide the coaches with calorie needs estimates for each individual rider per stage instantly. In addition, we compare methods to quantify uncertainty using conformal prediction. The empirical analysis of the jackknife+, jackknife-minmax, jackknife-minmax-after-bootstrap, CV+, CV-minmax, conformalized quantile regression, and inductive conformal prediction methods in conformal prediction reveals that all methods achieve valid prediction intervals. All but minmax-based methods also produce sufficiently narrow prediction intervals for decision-making. Furthermore, methods computing prediction intervals of fixed size produce tighter intervals for low significance values. Among the methods computing intervals of varying length across the input space, inductive conformal prediction computes narrower prediction intervals at larger significance level.
    Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning. (arXiv:2309.07666v1 [cs.LG])
    In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
    Optimal transport distances for directed, weighted graphs: a case study with cell-cell communication networks. (arXiv:2309.07030v2 [cs.LG] UPDATED)
    Comparing graphs by means of optimal transport has recently gained significant attention, as the distances induced by optimal transport provide both a principled metric between graphs as well as an interpretable description of the associated changes between graphs in terms of a transport plan. As the lack of symmetry introduces challenges in the typically considered formulations, optimal transport distances for graphs have mostly been developed for undirected graphs. Here, we propose two distance measures to compare directed graphs based on variants of optimal transport: (i) an earth movers distance (Wasserstein) and (ii) a Gromov-Wasserstein (GW) distance. We evaluate these two distances and discuss their relative performance for both simulated graph data and real-world directed cell-cell communication graphs, inferred from single-cell RNA-seq data.
    Preserved Edge Convolutional Neural Network for Sensitivity Enhancement of Deuterium Metabolic Imaging (DMI). (arXiv:2309.04100v2 [eess.IV] UPDATED)
    Purpose: Common to most MRSI techniques, the spatial resolution and the minimal scan duration of Deuterium Metabolic Imaging (DMI) are limited by the achievable SNR. This work presents a deep learning method for sensitivity enhancement of DMI. Methods: A convolutional neural network (CNN) was designed to estimate the 2H-labeled metabolite concentrations from low SNR and distorted DMI FIDs. The CNN was trained with synthetic data that represent a range of SNR levels typically encountered in vivo. The estimation precision was further improved by fine-tuning the CNN with MRI-based edge-preserving regularization for each DMI dataset. The proposed processing method, PReserved Edge ConvolutIonal neural network for Sensitivity Enhanced DMI (PRECISE-DMI), was applied to simulation studies and in vivo experiments to evaluate the anticipated improvements in SNR and investigate the potential for inaccuracies. Results: PRECISE-DMI visually improved the metabolic maps of low SNR datasets, and quantitatively provided higher precision than the standard Fourier reconstruction. Processing of DMI data acquired in rat brain tumor models resulted in more precise determination of 2H-labeled lactate and glutamate + glutamine levels, at increased spatial resolution (from >8 to 2 $\mu$L) or shortened scan time (from 32 to 4 min) compared to standard acquisitions. However, rigorous SD-bias analyses showed that overuse of the edge-preserving regularization can compromise the accuracy of the results. Conclusion: PRECISE-DMI allows a flexible trade-off between enhancing the sensitivity of DMI and minimizing the inaccuracies. With typical settings, the DMI sensitivity can be improved by 3-fold while retaining the capability to detect local signal variations.
    Massively-Parallel Heat Map Sorting and Applications To Explainable Clustering. (arXiv:2309.07486v1 [cs.DS])
    Given a set of points labeled with $k$ labels, we introduce the heat map sorting problem as reordering and merging the points and dimensions while preserving the clusters (labels). A cluster is preserved if it remains connected, i.e., if it is not split into several clusters and no two clusters are merged. We prove the problem is NP-hard and we give a fixed-parameter algorithm with a constant number of rounds in the massively parallel computation model, where each machine has a sublinear memory and the total memory of the machines is linear. We give an approximation algorithm for a NP-hard special case of the problem. We empirically compare our algorithm with k-means and density-based clustering (DBSCAN) using a dimensionality reduction via locality-sensitive hashing on several directed and undirected graphs of email and computer networks.
    Efficient Spatially Sparse Inference for Conditional GANs and Diffusion Models. (arXiv:2211.02048v4 [cs.CV] UPDATED)
    During image editing, existing deep generative models tend to re-synthesize the entire output from scratch, including the unedited regions. This leads to a significant waste of computation, especially for minor editing operations. In this work, we present Spatially Sparse Inference (SSI), a general-purpose technique that selectively performs computation for edited regions and accelerates various generative models, including both conditional GANs and diffusion models. Our key observation is that users prone to gradually edit the input image. This motivates us to cache and reuse the feature maps of the original image. Given an edited image, we sparsely apply the convolutional filters to the edited regions while reusing the cached features for the unedited areas. Based on our algorithm, we further propose Sparse Incremental Generative Engine (SIGE) to convert the computation reduction to latency reduction on off-the-shelf hardware. With about $1\%$-area edits, SIGE accelerates DDPM by $3.0\times$ on NVIDIA RTX 3090 and $4.6\times$ on Apple M1 Pro GPU, Stable Diffusion by $7.2\times$ on 3090, and GauGAN by $5.6\times$ on 3090 and $5.2\times$ on M1 Pro GPU. Compared to our conference version, we extend SIGE to accommodate attention layers and apply it to Stable Diffusion. Additionally, we offer support for Apple M1 Pro GPU and include more results with large and sequential edits.
    EnCodecMAE: Leveraging neural codecs for universal audio representation learning. (arXiv:2309.07391v1 [cs.SD])
    The goal of universal audio representation learning is to obtain foundational models that can be used for a variety of downstream tasks involving speech, music or environmental sounds. To approach this problem, methods inspired by self-supervised models from NLP, like BERT, are often used and adapted to audio. These models rely on the discrete nature of text, hence adopting this type of approach for audio processing requires either a change in the learning objective or mapping the audio signal to a set of discrete classes. In this work, we explore the use of EnCodec, a neural audio codec, to generate discrete targets for learning an universal audio model based on a masked autoencoder (MAE). We evaluate this approach, which we call EncodecMAE, on a wide range of audio tasks spanning speech, music and environmental sounds, achieving performances comparable or better than leading audio representation models.
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v4 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.
    Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME])
    Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    DGSD: Dynamical Graph Self-Distillation for EEG-Based Auditory Spatial Attention Detection. (arXiv:2309.07147v1 [eess.SP])
    Auditory Attention Detection (AAD) aims to detect target speaker from brain signals in a multi-speaker environment. Although EEG-based AAD methods have shown promising results in recent years, current approaches primarily rely on traditional convolutional neural network designed for processing Euclidean data like images. This makes it challenging to handle EEG signals, which possess non-Euclidean characteristics. In order to address this problem, this paper proposes a dynamical graph self-distillation (DGSD) approach for AAD, which does not require speech stimuli as input. Specifically, to effectively represent the non-Euclidean properties of EEG signals, dynamical graph convolutional networks are applied to represent the graph structure of EEG signals, which can also extract crucial features related to auditory spatial attention in EEG signals. In addition, to further improve AAD detection performance, self-distillation, consisting of feature distillation and hierarchical distillation strategies at each layer, is integrated. These strategies leverage features and classification results from the deepest network layers to guide the learning of shallow layers. Our experiments are conducted on two publicly available datasets, KUL and DTU. Under a 1-second time window, we achieve results of 90.0\% and 79.6\% accuracy on KUL and DTU, respectively. We compare our DGSD method with competitive baselines, and the experimental results indicate that the detection performance of our proposed DGSD method is not only superior to the best reproducible baseline but also significantly reduces the number of trainable parameters by approximately 100 times.
    Systematic Review of Experimental Paradigms and Deep Neural Networks for Electroencephalography-Based Cognitive Workload Detection. (arXiv:2309.07163v1 [eess.SP])
    This article summarizes a systematic review of the electroencephalography (EEG)-based cognitive workload (CWL) estimation. The focus of the article is twofold: identify the disparate experimental paradigms used for reliably eliciting discreet and quantifiable levels of cognitive load and the specific nature and representational structure of the commonly used input formulations in deep neural networks (DNNs) used for signal classification. The analysis revealed a number of studies using EEG signals in its native representation of a two-dimensional matrix for offline classification of CWL. However, only a few studies adopted an online or pseudo-online classification strategy for real-time CWL estimation. Further, only a couple of interpretable DNNs and a single generative model were employed for cognitive load detection till date during this review. More often than not, researchers were using DNNs as black-box type models. In conclusion, DNNs prove to be valuable tools for classifying EEG signals, primarily due to the substantial modeling power provided by the depth of their network architecture. It is further suggested that interpretable and explainable DNN models must be employed for cognitive workload estimation since existing methods are limited in the face of the non-stationary nature of the signal.
    The Grand Illusion: The Myth of Software Portability and Implications for ML Progress. (arXiv:2309.07181v1 [cs.SE])
    Pushing the boundaries of machine learning often requires exploring different hardware and software combinations. However, the freedom to experiment across different tooling stacks can be at odds with the drive for efficiency, which has produced increasingly specialized AI hardware and incentivized consolidation around a narrow set of ML frameworks. Exploratory research can be restricted if software and hardware are co-evolving, making it even harder to stray away from mainstream ideas that work well with popular tooling stacks. While this friction increasingly impacts the rate of innovation in machine learning, to our knowledge the lack of portability in tooling has not been quantified. In this work, we ask: How portable are popular ML software frameworks? We conduct a large-scale study of the portability of mainstream ML frameworks across different hardware types. Our findings paint an uncomfortable picture -- frameworks can lose more than 40% of their key functions when ported to other hardware. Worse, even when functions are portable, the slowdown in their performance can be extreme and render performance untenable. Collectively, our results reveal how costly straying from a narrow set of hardware-software combinations can be - and suggest that specialization of hardware impedes innovation in machine learning research.
    The effect of data augmentation and 3D-CNN depth on Alzheimer's Disease detection. (arXiv:2309.07192v1 [eess.IV])
    Machine Learning (ML) has emerged as a promising approach in healthcare, outperforming traditional statistical techniques. However, to establish ML as a reliable tool in clinical practice, adherence to best practices regarding data handling, experimental design, and model evaluation is crucial. This work summarizes and strictly observes such practices to ensure reproducible and reliable ML. Specifically, we focus on Alzheimer's Disease (AD) detection, which serves as a paradigmatic example of challenging problem in healthcare. We investigate the impact of different data augmentation techniques and model complexity on the overall performance. We consider MRI data from ADNI dataset to address a classification problem employing 3D Convolutional Neural Network (CNN). The experiments are designed to compensate for data scarcity and initial random parameters by utilizing cross-validation and multiple training trials. Within this framework, we train 15 predictive models, considering three different data augmentation strategies and five distinct 3D CNN architectures, each varying in the number of convolutional layers. Specifically, the augmentation strategies are based on affine transformations, such as zoom, shift, and rotation, applied concurrently or separately. The combined effect of data augmentation and model complexity leads to a variation in prediction performance up to 10% of accuracy. When affine transformation are applied separately, the model is more accurate, independently from the adopted architecture. For all strategies, the model accuracy followed a concave behavior at increasing number of convolutional layers, peaking at an intermediate value of layers. The best model (8 CL, (B)) is the most stable across cross-validation folds and training trials, reaching excellent performance both on the testing set and on an external test set.
    Improved Auto-Encoding using Deterministic Projected Belief Networks. (arXiv:2309.07481v1 [cs.LG])
    In this paper, we exploit the unique properties of a deterministic projected belief network (D-PBN) to take full advantage of trainable compound activation functions (TCAs). A D-PBN is a type of auto-encoder that operates by "backing up" through a feed-forward neural network. TCAs are activation functions with complex monotonic-increasing shapes that change the distribution of the data so that the linear transformation that follows is more effective. Because a D-PBN operates by "backing up", the TCAs are inverted in the reconstruction process, restoring the original distribution of the data, thus taking advantage of a given TCA in both analysis and reconstruction. In this paper, we show that a D-PBN auto-encoder with TCAs can significantly out-perform standard auto-encoders including variational auto-encoders.
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v1 [cs.LG])
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.
    Virchow: A Million-Slide Digital Pathology Foundation Model. (arXiv:2309.07778v1 [eess.IV])
    Computational pathology uses artificial intelligence to enable precision medicine and decision support systems through the analysis of whole slide images. It has the potential to revolutionize the diagnosis and treatment of cancer. However, a major challenge to this objective is that for many specific computational pathology tasks the amount of data is inadequate for development. To address this challenge, we created Virchow, a 632 million parameter deep neural network foundation model for computational pathology. Using self-supervised learning, Virchow is trained on 1.5 million hematoxylin and eosin stained whole slide images from diverse tissue groups, which is orders of magnitude more data than previous works. When evaluated on downstream tasks including tile-level pan-cancer detection and subtyping and slide-level biomarker prediction, Virchow outperforms state-of-the-art systems both on internal datasets drawn from the same population as the pretraining data as well as external public datasets. Virchow achieves 93% balanced accuracy for pancancer tile classification, and AUCs of 0.983 for colon microsatellite instability status prediction and 0.967 for breast CDH1 status prediction. The gains in performance highlight the importance of pretraining on massive pathology image datasets, suggesting pretraining on even larger datasets could continue improving performance for many high-impact applications where limited amounts of training data are available, such as drug outcome prediction.
    PRE: Vision-Language Prompt Learning with Reparameterization Encoder. (arXiv:2309.07760v1 [cs.CV])
    Large pre-trained vision-language models such as CLIP have demonstrated great potential in zero-shot transferability to downstream tasks. However, to attain optimal performance, the manual selection of prompts is necessary to improve alignment between the downstream image distribution and the textual class descriptions. This manual prompt engineering is the major challenge for deploying such models in practice since it requires domain expertise and is extremely time-consuming. To avoid non-trivial prompt engineering, recent work Context Optimization (CoOp) introduced the concept of prompt learning to the vision domain using learnable textual tokens. While CoOp can achieve substantial improvements over manual prompts, its learned context is worse generalizable to wider unseen classes within the same dataset. In this work, we present Prompt Learning with Reparameterization Encoder (PRE) - a simple and efficient method that enhances the generalization ability of the learnable prompt to unseen classes while maintaining the capacity to learn Base classes. Instead of directly optimizing the prompts, PRE employs a prompt encoder to reparameterize the input prompt embeddings, enhancing the exploration of task-specific knowledge from few-shot samples. Experiments and extensive ablation studies on 8 benchmarks demonstrate that our approach is an efficient method for prompt learning. Specifically, PRE achieves a notable enhancement of 5.60% in average accuracy on New classes and 3% in Harmonic mean compared to CoOp in the 16-shot setting, all achieved within a good training time.
    Advancing Regular Language Reasoning in Linear Recurrent Neural Networks. (arXiv:2309.07412v1 [cs.CL])
    In recent studies, linear recurrent neural networks (LRNNs) have achieved Transformer-level performance in natural language modeling and long-range modeling while offering rapid parallel training and constant inference costs. With the resurged interest in LRNNs, we study whether they can learn the hidden rules in training sequences, such as the grammatical structures of regular language. We theoretically analyze some existing LRNNs and discover their limitations on regular language. Motivated by the analysis, we propose a new LRNN equipped with a block-diagonal and input-dependent transition matrix. Experiments suggest that the proposed model is the only LRNN that can perform length extrapolation on regular language tasks such as Sum, Even Pair, and Modular Arithmetic.
    Learning Beyond Similarities: Incorporating Dissimilarities between Positive Pairs in Self-Supervised Time Series Learning. (arXiv:2309.07526v1 [cs.LG])
    By identifying similarities between successive inputs, Self-Supervised Learning (SSL) methods for time series analysis have demonstrated their effectiveness in encoding the inherent static characteristics of temporal data. However, an exclusive emphasis on similarities might result in representations that overlook the dynamic attributes critical for modeling cardiovascular diseases within a confined subject cohort. Introducing Distilled Encoding Beyond Similarities (DEBS), this paper pioneers an SSL approach that transcends mere similarities by integrating dissimilarities among positive pairs. The framework is applied to electrocardiogram (ECG) signals, leading to a notable enhancement of +10\% in the detection accuracy of Atrial Fibrillation (AFib) across diverse subjects. DEBS underscores the potential of attaining a more refined representation by encoding the dynamic characteristics of time series data, tapping into dissimilarities during the optimization process. Broadly, the strategy delineated in this study holds the promise of unearthing novel avenues for advancing SSL methodologies tailored to temporal data.
    Unbiased Face Synthesis With Diffusion Models: Are We There Yet?. (arXiv:2309.07277v1 [cs.CV])
    Text-to-image diffusion models have achieved widespread popularity due to their unprecedented image generation capability. In particular, their ability to synthesize and modify human faces has spurred research into using generated face images in both training data augmentation and model performance assessments. In this paper, we study the efficacy and shortcomings of generative models in the context of face generation. Utilizing a combination of qualitative and quantitative measures, including embedding-based metrics and user studies, we present a framework to audit the characteristics of generated faces conditioned on a set of social attributes. We applied our framework on faces generated through state-of-the-art text-to-image diffusion models. We identify several limitations of face image generation that include faithfulness to the text prompt, demographic disparities, and distributional shifts. Furthermore, we present an analytical model that provides insights into how training data selection contributes to the performance of generative models.
    SC-MAD: Mixtures of Higher-order Networks for Data Augmentation. (arXiv:2309.07453v1 [stat.ML])
    The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    Mitigating Adversarial Attacks in Federated Learning with Trusted Execution Environments. (arXiv:2309.07197v1 [cs.LG])
    The main premise of federated learning (FL) is that machine learning model updates are computed locally to preserve user data privacy. This approach avoids by design user data to ever leave the perimeter of their device. Once the updates aggregated, the model is broadcast to all nodes in the federation. However, without proper defenses, compromised nodes can probe the model inside their local memory in search for adversarial examples, which can lead to dangerous real-world scenarios. For instance, in image-based applications, adversarial examples consist of images slightly perturbed to the human eye getting misclassified by the local model. These adversarial images are then later presented to a victim node's counterpart model to replay the attack. Typical examples harness dissemination strategies such as altered traffic signs (patch attacks) no longer recognized by autonomous vehicles or seemingly unaltered samples that poison the local dataset of the FL scheme to undermine its robustness. Pelta is a novel shielding mechanism leveraging Trusted Execution Environments (TEEs) that reduce the ability of attackers to craft adversarial samples. Pelta masks inside the TEE the first part of the back-propagation chain rule, typically exploited by attackers to craft the malicious samples. We evaluate Pelta on state-of-the-art accurate models using three well-established datasets: CIFAR-10, CIFAR-100 and ImageNet. We show the effectiveness of Pelta in mitigating six white-box state-of-the-art adversarial attacks, such as Projected Gradient Descent, Momentum Iterative Method, Auto Projected Gradient Descent, the Carlini & Wagner attack. In particular, Pelta constitutes the first attempt at defending an ensemble model against the Self-Attention Gradient attack to the best of our knowledge. Our code is available to the research community at https://github.com/queyrusi/Pelta.
    Exploring Large Language Models for Ontology Alignment. (arXiv:2309.07172v1 [cs.AI])
    This work investigates the applicability of recent generative Large Language Models (LLMs), such as the GPT series and Flan-T5, to ontology alignment for identifying concept equivalence mappings across ontologies. To test the zero-shot performance of Flan-T5-XXL and GPT-3.5-turbo, we leverage challenging subsets from two equivalence matching datasets of the OAEI Bio-ML track, taking into account concept labels and structural contexts. Preliminary findings suggest that LLMs have the potential to outperform existing ontology alignment systems like BERTMap, given careful framework and prompt design.
    Frequency Convergence of Complexon Shift Operators. (arXiv:2309.07169v1 [eess.SP])
    Topological signal processing (TSP) utilizes simplicial complexes to model structures with higher order than vertices and edges. In this paper, we study the transferability of TSP via a generalized higher-order version of graphon, known as complexon. We recall the notion of a complexon as the limit of a simplicial complex sequence [1]. Inspired by the integral operator form of graphon shift operators, we construct a marginal complexon and complexon shift operator (CSO) according to components of all possible dimensions from the complexon. We investigate the CSO's eigenvalues and eigenvectors, and relate them to a new family of weighted adjacency matrices. We prove that when a simplicial complex sequence converges to a complexon, the eigenvalues of the corresponding CSOs converge to that of the limit complexon. These results hint at learning transferability on large simplicial complexes or simplicial complex sequences, which generalize the graphon signal processing framework.
    Audio-Based Classification of Respiratory Diseases using Advanced Signal Processing and Machine Learning for Assistive Diagnosis Support. (arXiv:2309.07183v1 [eess.SP])
    In global healthcare, respiratory diseases are a leading cause of mortality, underscoring the need for rapid and accurate diagnostics. To advance rapid screening techniques via auscultation, our research focuses on employing one of the largest publicly available medical database of respiratory sounds to train multiple machine learning models able to classify different health conditions. Our method combines Empirical Mode Decomposition (EMD) and spectral analysis to extract physiologically relevant biosignals from acoustic data, closely tied to cardiovascular and respiratory patterns, making our approach apart in its departure from conventional audio feature extraction practices. We use Power Spectral Density analysis and filtering techniques to select Intrinsic Mode Functions (IMFs) strongly correlated with underlying physiological phenomena. These biosignals undergo a comprehensive feature extraction process for predictive modeling. Initially, we deploy a binary classification model that demonstrates a balanced accuracy of 87% in distinguishing between healthy and diseased individuals. Subsequently, we employ a six-class classification model that achieves a balanced accuracy of 72% in diagnosing specific respiratory conditions like pneumonia and chronic obstructive pulmonary disease (COPD). For the first time, we also introduce regression models that estimate age and body mass index (BMI) based solely on acoustic data, as well as a model for gender classification. Our findings underscore the potential of this approach to significantly enhance assistive and remote diagnostic capabilities.
    Boosting Unsupervised Contrastive Learning Using Diffusion-Based Data Augmentation From Scratch. (arXiv:2309.07909v1 [cs.LG])
    Unsupervised contrastive learning methods have recently seen significant improvements, particularly through data augmentation strategies that aim to produce robust and generalizable representations. However, prevailing data augmentation methods, whether hand designed or based on foundation models, tend to rely heavily on prior knowledge or external data. This dependence often compromises their effectiveness and efficiency. Furthermore, the applicability of most existing data augmentation strategies is limited when transitioning to other research domains, especially science-related data. This limitation stems from the paucity of prior knowledge and labeled data available in these domains. To address these challenges, we introduce DiffAug-a novel and efficient Diffusion-based data Augmentation technique. DiffAug aims to ensure that the augmented and original data share a smoothed latent space, which is achieved through diffusion steps. Uniquely, unlike traditional methods, DiffAug first mines sufficient prior semantic knowledge about the neighborhood. This provides a constraint to guide the diffusion steps, eliminating the need for labels, external data/models, or prior knowledge. Designed as an architecture-agnostic framework, DiffAug provides consistent improvements. Specifically, it improves image classification and clustering accuracy by 1.6%~4.5%. When applied to biological data, DiffAug improves performance by up to 10.1%, with an average improvement of 5.8%. DiffAug shows good performance in both vision and biological domains.
    Distribution Grid Line Outage Identification with Unknown Pattern and Performance Guarantee. (arXiv:2309.07157v1 [cs.LG])
    Line outage identification in distribution grids is essential for sustainable grid operation. In this work, we propose a practical yet robust detection approach that utilizes only readily available voltage magnitudes, eliminating the need for costly phase angles or power flow data. Given the sensor data, many existing detection methods based on change-point detection require prior knowledge of outage patterns, which are unknown for real-world outage scenarios. To remove this impractical requirement, we propose a data-driven method to learn the parameters of the post-outage distribution through gradient descent. However, directly using gradient descent presents feasibility issues. To address this, we modify our approach by adding a Bregman divergence constraint to control the trajectory of the parameter updates, which eliminates the feasibility problems. As timely operation is the key nowadays, we prove that the optimal parameters can be learned with convergence guarantees via leveraging the statistical and physical properties of voltage data. We evaluate our approach using many representative distribution grids and real load profiles with 17 outage configurations. The results show that we can detect and localize the outage in a timely manner with only voltage magnitudes and without assuming a prior knowledge of outage patterns.
    Effective Latent Differential Equation Models via Attention and Multiple Shooting. (arXiv:2307.05735v3 [cs.LG] UPDATED)
    Scientific Machine Learning (SciML) is a burgeoning field that synergistically combines domain-aware and interpretable models with agnostic machine learning techniques. In this work, we introduce GOKU-UI, an evolution of the SciML generative model GOKU-nets. GOKU-UI not only broadens the original model's spectrum to incorporate other classes of differential equations, such as Stochastic Differential Equations (SDEs), but also integrates attention mechanisms and a novel multiple shooting training strategy in the latent space. These modifications have led to a significant increase in its performance in both reconstruction and forecast tasks, as demonstrated by our evaluation of simulated and empirical data. Specifically, GOKU-UI outperformed all baseline models on synthetic datasets even with a training set 16-fold smaller, underscoring its remarkable data efficiency. Furthermore, when applied to empirical human brain data, while incorporating stochastic Stuart-Landau oscillators into its dynamical core, our proposed enhancements markedly increased the model's effectiveness in capturing complex brain dynamics. This augmented version not only surpassed all baseline methods in the reconstruction task, but also demonstrated lower prediction error of future brain activity up to 15 seconds ahead. By training GOKU-UI on resting state fMRI data, we encoded whole-brain dynamics into a latent representation, learning a low-dimensional dynamical system model that could offer insights into brain functionality and open avenues for practical applications such as the classification of mental states or psychiatric conditions. Ultimately, our research provides further impetus for the field of Scientific Machine Learning, showcasing the potential for advancements when established scientific insights are interwoven with modern machine learning.
    A Strong and Simple Deep Learning Baseline for BCI MI Decoding. (arXiv:2309.07159v1 [eess.SP])
    We propose EEG-SimpleConv, a straightforward 1D convolutional neural network for Motor Imagery decoding in BCI. Our main motivation is to propose a very simple baseline to compare to, using only very standard ingredients from the literature. We evaluate its performance on four EEG Motor Imagery datasets, including simulated online setups, and compare it to recent Deep Learning and Machine Learning approaches. EEG-SimpleConv is at least as good or far more efficient than other approaches, showing strong knowledge-transfer capabilities across subjects, at the cost of a low inference time. We advocate that using off-the-shelf ingredients rather than coming with ad-hoc solutions can significantly help the adoption of Deep Learning approaches for BCI. We make the code of the models and the experiments accessible.
    An Optimal Control Method to Compute the Most Likely Transition Path for Stochastic Dynamical Systems with Jumps. (arXiv:2203.16874v2 [math.NA] UPDATED)
    Many complex real world phenomena exhibit abrupt, intermittent or jumping behaviors, which are more suitable to be described by stochastic differential equations under non-Gaussian L\'evy noise. Among these complex phenomena, the most likely transition paths between metastable states are important since these rare events may have a high impact in certain scenarios. Based on the large deviation principle, the most likely transition path could be treated as the minimizer of the rate function upon paths that connect two points. One of the challenges to calculate the most likely transition path for stochastic dynamical systems under non-Gaussian L\'evy noise is that the associated rate function can not be explicitly expressed by paths. For this reason, we formulate an optimal control problem to obtain the optimal state as the most likely transition path. We then develop a neural network method to solve this issue. Several experiments are investigated for both Gaussian and non-Gaussian cases.
    Decoding visual brain representations from electroencephalography through Knowledge Distillation and latent diffusion models. (arXiv:2309.07149v1 [eess.SP])
    Decoding visual representations from human brain activity has emerged as a thriving research domain, particularly in the context of brain-computer interfaces. Our study presents an innovative method that employs to classify and reconstruct images from the ImageNet dataset using electroencephalography (EEG) data from subjects that had viewed the images themselves (i.e. "brain decoding"). We analyzed EEG recordings from 6 participants, each exposed to 50 images spanning 40 unique semantic categories. These EEG readings were converted into spectrograms, which were then used to train a convolutional neural network (CNN), integrated with a knowledge distillation procedure based on a pre-trained Contrastive Language-Image Pre-Training (CLIP)-based image classification teacher network. This strategy allowed our model to attain a top-5 accuracy of 80%, significantly outperforming a standard CNN and various RNN-based benchmarks. Additionally, we incorporated an image reconstruction mechanism based on pre-trained latent diffusion models, which allowed us to generate an estimate of the images which had elicited EEG activity. Therefore, our architecture not only decodes images from neural activity but also offers a credible image reconstruction from EEG only, paving the way for e.g. swift, individualized feedback experiments. Our research represents a significant step forward in connecting neural signals with visual cognition.
    Naturalistic Robot Arm Trajectory Generation via Representation Learning. (arXiv:2309.07550v1 [cs.RO])
    The integration of manipulator robots in household environments suggests a need for more predictable and human-like robot motion. This holds especially true for wheelchair-mounted assistive robots that can support the independence of people with paralysis. One method of generating naturalistic motion trajectories is via the imitation of human demonstrators. This paper explores a self-supervised imitation learning method using an autoregressive spatio-temporal graph neural network for an assistive drinking task. We address learning from diverse human motion trajectory data that were captured via wearable IMU sensors on a human arm as the action-free task demonstrations. Observed arm motion data from several participants is used to generate natural and functional drinking motion trajectories for a UR5e robot arm.
    Beta quantile regression for robust estimation of uncertainty in the presence of outliers. (arXiv:2309.07374v1 [cs.LG])
    Quantile Regression (QR) can be used to estimate aleatoric uncertainty in deep neural networks and can generate prediction intervals. Quantifying uncertainty is particularly important in critical applications such as clinical diagnosis, where a realistic assessment of uncertainty is essential in determining disease status and planning the appropriate treatment. The most common application of quantile regression models is in cases where the parametric likelihood cannot be specified. Although quantile regression is quite robust to outlier response observations, it can be sensitive to outlier covariate observations (features). Outlier features can compromise the performance of deep learning regression problems such as style translation, image reconstruction, and deep anomaly detection, potentially leading to misleading conclusions. To address this problem, we propose a robust solution for quantile regression that incorporates concepts from robust divergence. We compare the performance of our proposed method with (i) least trimmed quantile regression and (ii) robust regression based on the regularization of case-specific parameters in a simple real dataset in the presence of outlier. These methods have not been applied in a deep learning framework. We also demonstrate the applicability of the proposed method by applying it to a medical imaging translation task using diffusion models.
    Overview of Human Activity Recognition Using Sensor Data. (arXiv:2309.07170v1 [eess.SP])
    Human activity recognition (HAR) is an essential research field that has been used in different applications including home and workplace automation, security and surveillance as well as healthcare. Starting from conventional machine learning methods to the recently developing deep learning techniques and the Internet of things, significant contributions have been shown in the HAR area in the last decade. Even though several review and survey studies have been published, there is a lack of sensor-based HAR overview studies focusing on summarising the usage of wearable sensors and smart home sensors data as well as applications of HAR and deep learning techniques. Hence, we overview sensor-based HAR, discuss several important applications that rely on HAR, and highlight the most common machine learning methods that have been used for HAR. Finally, several challenges of HAR are explored that should be addressed to further improve the robustness of HAR.
    Tackling the dimensions in imaging genetics with CLUB-PLS. (arXiv:2309.07352v1 [q-bio.GN])
    A major challenge in imaging genetics and similar fields is to link high-dimensional data in one domain, e.g., genetic data, to high dimensional data in a second domain, e.g., brain imaging data. The standard approach in the area are mass univariate analyses across genetic factors and imaging phenotypes. That entails executing one genome-wide association study (GWAS) for each pre-defined imaging measure. Although this approach has been tremendously successful, one shortcoming is that phenotypes must be pre-defined. Consequently, effects that are not confined to pre-selected regions of interest or that reflect larger brain-wide patterns can easily be missed. In this work we introduce a Partial Least Squares (PLS)-based framework, which we term Cluster-Bootstrap PLS (CLUB-PLS), that can work with large input dimensions in both domains as well as with large sample sizes. One key factor of the framework is to use cluster bootstrap to provide robust statistics for single input features in both domains. We applied CLUB-PLS to investigating the genetic basis of surface area and cortical thickness in a sample of 33,000 subjects from the UK Biobank. We found 107 genome-wide significant locus-phenotype pairs that are linked to 386 different genes. We found that a vast majority of these loci could be technically validated at a high rate: using classic GWAS or Genome-Wide Inferred Statistics (GWIS) we found that 85 locus-phenotype pairs exceeded the genome-wide suggestive (P<1e-05) threshold.
    Sleep Stage Classification Using a Pre-trained Deep Learning Model. (arXiv:2309.07182v1 [eess.SP])
    One of the common human diseases is sleep disorders. The classification of sleep stages plays a fundamental role in diagnosing sleep disorders, monitoring treatment effectiveness, and understanding the relationship between sleep stages and various health conditions. A precise and efficient classification of these stages can significantly enhance our understanding of sleep-related phenomena and ultimately lead to improved health outcomes and disease treatment. Models others propose are often time-consuming and lack sufficient accuracy, especially in stage N1. The main objective of this research is to present a machine-learning model called "EEGMobile". This model utilizes pre-trained models and learns from electroencephalogram (EEG) spectrograms of brain signals. The model achieved an accuracy of 86.97% on a publicly available dataset named "Sleep-EDF20", outperforming other models proposed by different researchers. Moreover, it recorded an accuracy of 56.4% in stage N1, which is better than other models. These findings demonstrate that this model has the potential to achieve better results for the treatment of this disease.
    Benchmarking machine learning models for quantum state classification. (arXiv:2309.07679v1 [quant-ph])
    Quantum computing is a growing field where the information is processed by two-levels quantum states known as qubits. Current physical realizations of qubits require a careful calibration, composed by different experiments, due to noise and decoherence phenomena. Among the different characterization experiments, a crucial step is to develop a model to classify the measured state by discriminating the ground state from the excited state. In this proceedings we benchmark multiple classification techniques applied to real quantum devices.
    Design of Recognition and Evaluation System for Table Tennis Players' Motor Skills Based on Artificial Intelligence. (arXiv:2309.07141v1 [eess.SP])
    With the rapid development of electronic science and technology, the research on wearable devices is constantly updated, but for now, it is not comprehensive for wearable devices to recognize and analyze the movement of specific sports. Based on this, this paper improves wearable devices of table tennis sport, and realizes the pattern recognition and evaluation of table tennis players' motor skills through artificial intelligence. Firstly, a device is designed to collect the movement information of table tennis players and the actual movement data is processed. Secondly, a sliding window is made to divide the collected motion data into a characteristic database of six table tennis benchmark movements. Thirdly, motion features were constructed based on feature engineering, and motor skills were identified for different models after dimensionality reduction. Finally, the hierarchical evaluation system of motor skills is established with the loss functions of different evaluation indexes. The results show that in the recognition of table tennis players' motor skills, the feature-based BP neural network proposed in this paper has higher recognition accuracy and stronger generalization ability than the traditional convolutional neural network.
    Finding Influencers in Complex Networks: An Effective Deep Reinforcement Learning Approach. (arXiv:2309.07153v1 [cs.SI])
    Maximizing influences in complex networks is a practically important but computationally challenging task for social network analysis, due to its NP- hard nature. Most current approximation or heuristic methods either require tremendous human design efforts or achieve unsatisfying balances between effectiveness and efficiency. Recent machine learning attempts only focus on speed but lack performance enhancement. In this paper, different from previous attempts, we propose an effective deep reinforcement learning model that achieves superior performances over traditional best influence maximization algorithms. Specifically, we design an end-to-end learning framework that combines graph neural network as the encoder and reinforcement learning as the decoder, named DREIM. Trough extensive training on small synthetic graphs, DREIM outperforms the state-of-the-art baseline methods on very large synthetic and real-world networks on solution quality, and we also empirically show its linear scalability with regard to the network size, which demonstrates its superiority in solving this problem.
    A Deep Dive into Sleep: Single-Channel EEG-Based Sleep Stage Classification with Model Interpretability. (arXiv:2309.07156v1 [eess.SP])
    Sleep, a fundamental physiological process, occupies a significant portion of our lives. Accurate classification of sleep stages serves as a crucial tool for evaluating sleep quality and identifying probable sleep disorders. This work introduces a novel methodology that utilises a SE-Resnet-Bi-LSTM architecture to classify sleep into five separate stages. The classification process is based on the analysis of single-channel electroencephalograms (EEGs). The framework that has been suggested consists of two fundamental elements: a feature extractor that utilises SE-ResNet, and a temporal context encoder that use stacks of Bi-LSTM units.The effectiveness of our approach is substantiated by thorough assessments conducted on three different datasets, namely SLeepEDF-20, SleepEDF-78, and SHHS. Significantly, our methodology attains notable levels of accuracy, specifically 87.5\%, 83.9\%, and 87.8\%, along with macro-F1 scores of 82.5, 78.9, and 81.9 for the corresponding datasets. Notably, we introduce the utilization of 1D-GradCAM visualization to shed light on the decision-making process of our model in the realm of sleep stage classification. This visualization method not only provides valuable insights into the model's classification rationale but also aligns its outcomes with the annotations made by sleep experts. One notable feature of our research is the integration of an expedited training approach, which effectively preserves the model's resilience in terms of performance. The experimental evaluations conducted provide a comprehensive evaluation of the effectiveness of our proposed model in comparison to existing approaches, highlighting its potential for practical applications.
    Direct Text to Speech Translation System using Acoustic Units. (arXiv:2309.07478v1 [cs.CL])
    This paper proposes a direct text to speech translation system using discrete acoustic units. This framework employs text in different source languages as input to generate speech in the target language without the need for text transcriptions in this language. Motivated by the success of acoustic units in previous works for direct speech to speech translation systems, we use the same pipeline to extract the acoustic units using a speech encoder combined with a clustering algorithm. Once units are obtained, an encoder-decoder architecture is trained to predict them. Then a vocoder generates speech from units. Our approach for direct text to speech translation was tested on the new CVSS corpus with two different text mBART models employed as initialisation. The systems presented report competitive performance for most of the language pairs evaluated. Besides, results show a remarkable improvement when initialising our proposed architecture with a model pre-trained with more languages.
    Generating Parametric BRDFs from Natural Language Descriptions. (arXiv:2306.15679v2 [cs.GR] UPDATED)
    Artistic authoring of 3D environments is a laborious enterprise that also requires skilled content creators. There have been impressive improvements in using machine learning to address different aspects of generating 3D content, such as generating meshes, arranging geometry, synthesizing textures, etc. In this paper we develop a model to generate Bidirectional Reflectance Distribution Functions (BRDFs) from descriptive textual prompts. BRDFs are four dimensional probability distributions that characterize the interaction of light with surface materials. They are either represented parametrically, or by tabulating the probability density associated with every pair of incident and outgoing angles. The former lends itself to artistic editing while the latter is used when measuring the appearance of real materials. Numerous works have focused on hypothesizing BRDF models from images of materials. We learn a mapping from textual descriptions of materials to parametric BRDFs. Our model is first trained using a semi-supervised approach before being tuned via an unsupervised scheme. Although our model is general, in this paper we specifically generate parameters for MDL materials, conditioned on natural language descriptions, within NVIDIA's Omniverse platform. This enables use cases such as real-time text prompts to change materials of objects in 3D environments such as "dull plastic" or "shiny iron". Since the output of our model is a parametric BRDF, rather than an image of the material, it may be used to render materials using any shape under arbitrarily specified viewing and lighting conditions.
    Model Reprogramming: Resource-Efficient Cross-Domain Machine Learning. (arXiv:2202.10629v3 [cs.LG] UPDATED)
    In data-rich domains such as vision, language, and speech, deep learning prevails to deliver high-performance task-specific models and can even learn general task-agnostic representations for efficient finetuning to downstream tasks. However, deep learning in resource-limited domains still faces multiple challenges including (i) limited data, (ii) constrained model development cost, and (iii) lack of adequate pre-trained models for effective finetuning. This paper provides an overview of model reprogramming to bridge this gap. Model reprogramming enables resource-efficient cross-domain machine learning by repurposing and reusing a well-developed pre-trained model from a source domain to solve tasks in a target domain without model finetuning, where the source and target domains can be vastly different. In many applications, model reprogramming outperforms transfer learning and training from scratch. This paper elucidates the methodology of model reprogramming, summarizes existing use cases, provides a theoretical explanation of the success of model reprogramming, and concludes with a discussion on open-ended research questions and opportunities. A list of model reprogramming studies is actively maintained and updated at https://github.com/IBM/model-reprogramming.
    A Robust SINDy Approach by Combining Neural Networks and an Integral Form. (arXiv:2309.07193v1 [math.DS])
    The discovery of governing equations from data has been an active field of research for decades. One widely used methodology for this purpose is sparse regression for nonlinear dynamics, known as SINDy. Despite several attempts, noisy and scarce data still pose a severe challenge to the success of the SINDy approach. In this work, we discuss a robust method to discover nonlinear governing equations from noisy and scarce data. To do this, we make use of neural networks to learn an implicit representation based on measurement data so that not only it produces the output in the vicinity of the measurements but also the time-evolution of output can be described by a dynamical system. Additionally, we learn such a dynamic system in the spirit of the SINDy framework. Leveraging the implicit representation using neural networks, we obtain the derivative information -- required for SINDy -- using an automatic differentiation tool. To enhance the robustness of our methodology, we further incorporate an integral condition on the output of the implicit networks. Furthermore, we extend our methodology to handle data collected from multiple initial conditions. We demonstrate the efficiency of the proposed methodology to discover governing equations under noisy and scarce data regimes by means of several examples and compare its performance with existing methods.
    SpikeCP: Delay-Adaptive Reliable Spiking Neural Networks via Conformal Prediction. (arXiv:2305.11322v3 [cs.NE] UPDATED)
    Spiking neural networks (SNNs) process time-series data via internal event-driven neural dynamics whose energy consumption depends on the number of spikes exchanged between neurons over the course of the input presentation. In typical implementations of an SNN classifier, decisions are produced after the entire input sequence has been processed, resulting in latency and energy consumption levels that are fairly uniform across inputs. Recently introduced delay-adaptive SNNs tailor the inference latency -- and, with it, the energy consumption -- to the difficulty of each example, by producing an early decision when the SNN model is sufficiently ``confident''. In this paper, we start by observing that, as an SNN processes input samples, its classification decisions tend to be first under-confident and then over-confident with respect to the decision's ground-truth, unknown, test accuracy. This makes it difficult to determine a stopping time that ensures a desired level of accuracy. To address this problem, we introduce a novel delay-adaptive SNN-based inference methodology that, wrapping around any pre-trained SNN classifier, provides guaranteed reliability for the decisions produced at input-dependent stopping times. The approach entails minimal added complexity as compared to the underlying SNN, requiring only thresholding and counting operations at run time, and it leverages tools from conformal prediction (CP).
    Tree of Uncertain Thoughts Reasoning for Large Language Models. (arXiv:2309.07694v1 [cs.CL])
    While the recently introduced Tree of Thoughts (ToT) has heralded advancements in allowing Large Language Models (LLMs) to reason through foresight and backtracking for global decision-making, it has overlooked the inherent local uncertainties in intermediate decision points or "thoughts". These local uncertainties, intrinsic to LLMs given their potential for diverse responses, remain a significant concern in the reasoning process. Addressing this pivotal gap, we introduce the Tree of Uncertain Thoughts (TouT) - a reasoning framework tailored for LLMs. Our TouT effectively leverages Monte Carlo Dropout to quantify uncertainty scores associated with LLMs' diverse local responses at these intermediate steps. By marrying this local uncertainty quantification with global search algorithms, TouT enhances the model's precision in response generation. We substantiate our approach with rigorous experiments on two demanding planning tasks: Game of 24 and Mini Crosswords. The empirical evidence underscores TouT's superiority over both ToT and chain-of-thought prompting methods.
    Equivariant Data Augmentation for Generalization in Offline Reinforcement Learning. (arXiv:2309.07578v1 [cs.LG])
    We present a novel approach to address the challenge of generalization in offline reinforcement learning (RL), where the agent learns from a fixed dataset without any additional interaction with the environment. Specifically, we aim to improve the agent's ability to generalize to out-of-distribution goals. To achieve this, we propose to learn a dynamics model and check if it is equivariant with respect to a fixed type of transformation, namely translations in the state space. We then use an entropy regularizer to increase the equivariant set and augment the dataset with the resulting transformed samples. Finally, we learn a new policy offline based on the augmented dataset, with an off-the-shelf offline RL algorithm. Our experimental results demonstrate that our approach can greatly improve the test performance of the policy on the considered environments.
    BAARD: Blocking Adversarial Examples by Testing for Applicability, Reliability and Decidability. (arXiv:2105.00495v2 [cs.LG] UPDATED)
    Adversarial defenses protect machine learning models from adversarial attacks, but are often tailored to one type of model or attack. The lack of information on unknown potential attacks makes detecting adversarial examples challenging. Additionally, attackers do not need to follow the rules made by the defender. To address this problem, we take inspiration from the concept of Applicability Domain in cheminformatics. Cheminformatics models struggle to make accurate predictions because only a limited number of compounds are known and available for training. Applicability Domain defines a domain based on the known compounds and rejects any unknown compound that falls outside the domain. Similarly, adversarial examples start as harmless inputs, but can be manipulated to evade reliable classification by moving outside the domain of the classifier. We are the first to identify the similarity between Applicability Domain and adversarial detection. Instead of focusing on unknown attacks, we focus on what is known, the training data. We propose a simple yet robust triple-stage data-driven framework that checks the input globally and locally, and confirms that they are coherent with the model's output. This framework can be applied to any classification model and is not limited to specific attacks. We demonstrate these three stages work as one unit, effectively detecting various attacks, even for a white-box scenario.
    Joint Community Detection and Rotational Synchronization via Semidefinite Programming. (arXiv:2105.06031v2 [stat.ML] UPDATED)
    In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.
    A Latent Space Theory for Emergent Abilities in Large Language Models. (arXiv:2304.09960v3 [cs.CL] UPDATED)
    Languages are not created randomly but rather to communicate information. There is a strong association between languages and their underlying meanings, resulting in a sparse joint distribution that is heavily peaked according to their correlations. Moreover, these peak values happen to match with the marginal distribution of languages due to the sparsity. With the advent of LLMs trained on big data and large models, we can now precisely assess the marginal distribution of languages, providing a convenient means of exploring the sparse structures in the joint distribution for effective inferences. In this paper, we categorize languages as either unambiguous or {\epsilon}-ambiguous and present quantitative results to demonstrate that the emergent abilities of LLMs, such as language understanding, in-context learning, chain-of-thought prompting, and effective instruction fine-tuning, can all be attributed to Bayesian inference on the sparse joint distribution of languages.
    Federated Dataset Dictionary Learning for Multi-Source Domain Adaptation. (arXiv:2309.07670v1 [cs.LG])
    In this article, we propose an approach for federated domain adaptation, a setting where distributional shift exists among clients and some have unlabeled data. The proposed framework, FedDaDiL, tackles the resulting challenge through dictionary learning of empirical distributions. In our setting, clients' distributions represent particular domains, and FedDaDiL collectively trains a federated dictionary of empirical distributions. In particular, we build upon the Dataset Dictionary Learning framework by designing collaborative communication protocols and aggregation operations. The chosen protocols keep clients' data private, thus enhancing overall privacy compared to its centralized counterpart. We empirically demonstrate that our approach successfully generates labeled data on the target domain with extensive experiments on (i) Caltech-Office, (ii) TEP, and (iii) CWRU benchmarks. Furthermore, we compare our method to its centralized counterpart and other benchmarks in federated domain adaptation.
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v5 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.
    On the complexity of finding a local minimizer of a quadratic function over a polytope. (arXiv:2008.05558v5 [math.OC] UPDATED)
    We show that unless P=NP, there cannot be a polynomial-time algorithm that finds a point within Euclidean distance $c^n$ (for any constant $c \ge 0$) of a local minimizer of an $n$-variate quadratic function over a polytope. This result (even with $c=0$) answers a question of Pardalos and Vavasis that appeared in 1992 on a list of seven open problems in complexity theory for numerical optimization. Our proof technique also implies that the problem of deciding whether a quadratic function has a local minimizer over an (unbounded) polyhedron, and that of deciding if a quartic polynomial has a local minimizer are NP-hard.
    Physics-constrained robust learning of open-form PDEs from limited and noisy data. (arXiv:2309.07672v1 [cs.LG])
    Unveiling the underlying governing equations of nonlinear dynamic systems remains a significant challenge, especially when encountering noisy observations and no prior knowledge available. This study proposes R-DISCOVER, a framework designed to robustly uncover open-form partial differential equations (PDEs) from limited and noisy data. The framework operates through two alternating update processes: discovering and embedding. The discovering phase employs symbolic representation and a reinforcement learning (RL)-guided hybrid PDE generator to efficiently produce diverse open-form PDEs with tree structures. A neural network-based predictive model fits the system response and serves as the reward evaluator for the generated PDEs. PDEs with superior fits are utilized to iteratively optimize the generator via the RL method and the best-performing PDE is selected by a parameter-free stability metric. The embedding phase integrates the initially identified PDE from the discovering process as a physical constraint into the predictive model for robust training. The traversal of PDE trees automates the construction of the computational graph and the embedding process without human intervention. Numerical experiments demonstrate our framework's capability to uncover governing equations from nonlinear dynamic systems with limited and highly noisy data and outperform other physics-informed neural network-based discovery methods. This work opens new potential for exploring real-world systems with limited understanding.
    TrojViT: Trojan Insertion in Vision Transformers. (arXiv:2208.13049v4 [cs.LG] UPDATED)
    Vision Transformers (ViTs) have demonstrated the state-of-the-art performance in various vision-related tasks. The success of ViTs motivates adversaries to perform backdoor attacks on ViTs. Although the vulnerability of traditional CNNs to backdoor attacks is well-known, backdoor attacks on ViTs are seldom-studied. Compared to CNNs capturing pixel-wise local features by convolutions, ViTs extract global context information through patches and attentions. Na\"ively transplanting CNN-specific backdoor attacks to ViTs yields only a low clean data accuracy and a low attack success rate. In this paper, we propose a stealth and practical ViT-specific backdoor attack $TrojViT$. Rather than an area-wise trigger used by CNN-specific backdoor attacks, TrojViT generates a patch-wise trigger designed to build a Trojan composed of some vulnerable bits on the parameters of a ViT stored in DRAM memory through patch salience ranking and attention-target loss. TrojViT further uses minimum-tuned parameter update to reduce the bit number of the Trojan. Once the attacker inserts the Trojan into the ViT model by flipping the vulnerable bits, the ViT model still produces normal inference accuracy with benign inputs. But when the attacker embeds a trigger into an input, the ViT model is forced to classify the input to a predefined target class. We show that flipping only few vulnerable bits identified by TrojViT on a ViT model using the well-known RowHammer can transform the model into a backdoored one. We perform extensive experiments of multiple datasets on various ViT models. TrojViT can classify $99.64\%$ of test images to a target class by flipping $345$ bits on a ViT for ImageNet.Our codes are available at https://github.com/mxzheng/TrojViT
    Safe and Accelerated Deep Reinforcement Learning-based O-RAN Slicing: A Hybrid Transfer Learning Approach. (arXiv:2309.07265v1 [cs.NI])
    The open radio access network (O-RAN) architecture supports intelligent network control algorithms as one of its core capabilities. Data-driven applications incorporate such algorithms to optimize radio access network (RAN) functions via RAN intelligent controllers (RICs). Deep reinforcement learning (DRL) algorithms are among the main approaches adopted in the O-RAN literature to solve dynamic radio resource management problems. However, despite the benefits introduced by the O-RAN RICs, the practical adoption of DRL algorithms in real network deployments falls behind. This is primarily due to the slow convergence and unstable performance exhibited by DRL agents upon deployment and when facing previously unseen network conditions. In this paper, we address these challenges by proposing transfer learning (TL) as a core component of the training and deployment workflows for the DRL-based closed-loop control of O-RAN functionalities. To this end, we propose and design a hybrid TL-aided approach that leverages the advantages of both policy reuse and distillation TL methods to provide safe and accelerated convergence in DRL-based O-RAN slicing. We conduct a thorough experiment that accommodates multiple services, including real VR gaming traffic to reflect practical scenarios of O-RAN slicing. We also propose and implement policy reuse and distillation-aided DRL and non-TL-aided DRL as three separate baselines. The proposed hybrid approach shows at least: 7.7% and 20.7% improvements in the average initial reward value and the percentage of converged scenarios, and a 64.6% decrease in reward variance while maintaining fast convergence and enhancing the generalizability compared with the baselines.
    Self-Supervised Blind Source Separation via Multi-Encoder Autoencoders. (arXiv:2309.07138v1 [eess.SP])
    The task of blind source separation (BSS) involves separating sources from a mixture without prior knowledge of the sources or the mixing system. This is a challenging problem that often requires making restrictive assumptions about both the mixing system and the sources. In this paper, we propose a novel method for addressing BSS of non-linear mixtures by leveraging the natural feature subspace specialization ability of multi-encoder autoencoders with fully self-supervised learning without strong priors. During the training phase, our method unmixes the input into the separate encoding spaces of the multi-encoder network and then remixes these representations within the decoder for a reconstruction of the input. Then to perform source inference, we introduce a novel encoding masking technique whereby masking out all but one of the encodings enables the decoder to estimate a source signal. To this end, we also introduce a so-called pathway separation loss that encourages sparsity between the unmixed encoding spaces throughout the decoder's layers and a so-called zero reconstruction loss on the decoder for coherent source estimations. In order to carefully evaluate our method, we conduct experiments on a toy dataset and with real-world biosignal recordings from a polysomnography sleep study for extracting respiration.
    Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE. (arXiv:2309.07663v1 [stat.ML])
    In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    Efficient quantum recurrent reinforcement learning via quantum reservoir computing. (arXiv:2309.07339v1 [quant-ph])
    Quantum reinforcement learning (QRL) has emerged as a framework to solve sequential decision-making tasks, showcasing empirical quantum advantages. A notable development is through quantum recurrent neural networks (QRNNs) for memory-intensive tasks such as partially observable environments. However, QRL models incorporating QRNN encounter challenges such as inefficient training of QRL with QRNN, given that the computation of gradients in QRNN is both computationally expensive and time-consuming. This work presents a novel approach to address this challenge by constructing QRL agents utilizing QRNN-based reservoirs, specifically employing quantum long short-term memory (QLSTM). QLSTM parameters are randomly initialized and fixed without training. The model is trained using the asynchronous advantage actor-aritic (A3C) algorithm. Through numerical simulations, we validate the efficacy of our QLSTM-Reservoir RL framework. Its performance is assessed on standard benchmarks, demonstrating comparable results to a fully trained QLSTM RL model with identical architecture and training settings.
    Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze. (arXiv:2112.07611v3 [quant-ph] UPDATED)
    We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.
    Proximal Bellman mappings for reinforcement learning and their application to robust adaptive filtering. (arXiv:2309.07548v1 [eess.SP])
    This paper aims at the algorithmic/theoretical core of reinforcement learning (RL) by introducing the novel class of proximal Bellman mappings. These mappings are defined in reproducing kernel Hilbert spaces (RKHSs), to benefit from the rich approximation properties and inner product of RKHSs, they are shown to belong to the powerful Hilbertian family of (firmly) nonexpansive mappings, regardless of the values of their discount factors, and possess ample degrees of design freedom to even reproduce attributes of the classical Bellman mappings and to pave the way for novel RL designs. An approximate policy-iteration scheme is built on the proposed class of mappings to solve the problem of selecting online, at every time instance, the "optimal" exponent $p$ in a $p$-norm loss to combat outliers in linear adaptive filtering, without training data and any knowledge on the statistical properties of the outliers. Numerical tests on synthetic data showcase the superior performance of the proposed framework over several non-RL and kernel-based RL schemes.
    Efficient Learning of PDEs via Taylor Expansion and Sparse Decomposition into Value and Fourier Domains. (arXiv:2309.07344v1 [cs.LG])
    Accelerating the learning of Partial Differential Equations (PDEs) from experimental data will speed up the pace of scientific discovery. Previous randomized algorithms exploit sparsity in PDE updates for acceleration. However such methods are applicable to a limited class of decomposable PDEs, which have sparse features in the value domain. We propose Reel, which accelerates the learning of PDEs via random projection and has much broader applicability. Reel exploits the sparsity by decomposing dense updates into sparse ones in both the value and frequency domains. This decomposition enables efficient learning when the source of the updates consists of gradually changing terms across large areas (sparse in the frequency domain) in addition to a few rapid updates concentrated in a small set of "interfacial" regions (sparse in the value domain). Random projection is then applied to compress the sparse signals for learning. To expand the model applicability, Taylor series expansion is used in Reel to approximate the nonlinear PDE updates with polynomials in the decomposable form. Theoretically, we derive a constant factor approximation between the projected loss function and the original one with poly-logarithmic number of projected dimensions. Experimentally, we provide empirical evidence that our proposed Reel can lead to faster learning of PDE models (70-98% reduction in training time when the data is compressed to 1% of its original size) with comparable quality as the non-compressed models.
    Is Solving Graph Neural Tangent Kernel Equivalent to Training Graph Neural Network?. (arXiv:2309.07452v1 [cs.LG])
    A rising trend in theoretical deep learning is to understand why deep learning works through Neural Tangent Kernel (NTK) [jgh18], a kernel method that is equivalent to using gradient descent to train a multi-layer infinitely-wide neural network. NTK is a major step forward in the theoretical deep learning because it allows researchers to use traditional mathematical tools to analyze properties of deep neural networks and to explain various neural network techniques from a theoretical view. A natural extension of NTK on graph learning is \textit{Graph Neural Tangent Kernel (GNTK)}, and researchers have already provide GNTK formulation for graph-level regression and show empirically that this kernel method can achieve similar accuracy as GNNs on various bioinformatics datasets [dhs+19]. The remaining question now is whether solving GNTK regression is equivalent to training an infinite-wide multi-layer GNN using gradient descent. In this paper, we provide three new theoretical results. First, we formally prove this equivalence for graph-level regression. Second, we present the first GNTK formulation for node-level regression. Finally, we prove the equivalence for node-level regression.
    Rates of Convergence in Certain Native Spaces of Approximations used in Reinforcement Learning. (arXiv:2309.07383v1 [eess.SY])
    This paper studies convergence rates for some value function approximations that arise in a collection of reproducing kernel Hilbert spaces (RKHS) $H(\Omega)$. By casting an optimal control problem in a specific class of native spaces, strong rates of convergence are derived for the operator equation that enables offline approximations that appear in policy iteration. Explicit upper bounds on error in value function approximations are derived in terms of power function $\Pwr_{H,N}$ for the space of finite dimensional approximants $H_N$ in the native space $H(\Omega)$. These bounds are geometric in nature and refine some well-known, now classical results concerning convergence of approximations of value functions.
    GBE-MLZSL: A Group Bi-Enhancement Framework for Multi-Label Zero-Shot Learning. (arXiv:2309.00923v2 [cs.CV] UPDATED)
    This paper investigates a challenging problem of zero-shot learning in the multi-label scenario (MLZSL), wherein, the model is trained to recognize multiple unseen classes within a sample (e.g., an image) based on seen classes and auxiliary knowledge, e.g., semantic information. Existing methods usually resort to analyzing the relationship of various seen classes residing in a sample from the dimension of spatial or semantic characteristics, and transfer the learned model to unseen ones. But they ignore the effective integration of local and global features. That is, in the process of inferring unseen classes, global features represent the principal direction of the image in the feature space, while local features should maintain uniqueness within a certain range. This integrated neglect will make the model lose its grasp of the main components of the image. Relying only on the local existence of seen classes during the inference stage introduces unavoidable bias. In this paper, we propose a novel and effective group bi-enhancement framework for MLZSL, dubbed GBE-MLZSL, to fully make use of such properties and enable a more accurate and robust visual-semantic projection. Specifically, we split the feature maps into several feature groups, of which each feature group can be trained independently with the Local Information Distinguishing Module (LID) to ensure uniqueness. Meanwhile, a Global Enhancement Module (GEM) is designed to preserve the principal direction. Besides, a static graph structure is designed to construct the correlation of local features. Experiments on large-scale MLZSL benchmark datasets NUS-WIDE and Open-Images-v4 demonstrate that the proposed GBE-MLZSL outperforms other state-of-the-art methods with large margins.
    ETP: Learning Transferable ECG Representations via ECG-Text Pre-training. (arXiv:2309.07145v1 [eess.SP])
    In the domain of cardiovascular healthcare, the Electrocardiogram (ECG) serves as a critical, non-invasive diagnostic tool. Although recent strides in self-supervised learning (SSL) have been promising for ECG representation learning, these techniques often require annotated samples and struggle with classes not present in the fine-tuning stages. To address these limitations, we introduce ECG-Text Pre-training (ETP), an innovative framework designed to learn cross-modal representations that link ECG signals with textual reports. For the first time, this framework leverages the zero-shot classification task in the ECG domain. ETP employs an ECG encoder along with a pre-trained language model to align ECG signals with their corresponding textual reports. The proposed framework excels in both linear evaluation and zero-shot classification tasks, as demonstrated on the PTB-XL and CPSC2018 datasets, showcasing its ability for robust and generalizable cross-modal ECG feature learning.
    Bringing PDEs to JAX with forward and reverse modes automatic differentiation. (arXiv:2309.07137v1 [cs.MS])
    Partial differential equations (PDEs) are used to describe a variety of physical phenomena. Often these equations do not have analytical solutions and numerical approximations are used instead. One of the common methods to solve PDEs is the finite element method. Computing derivative information of the solution with respect to the input parameters is important in many tasks in scientific computing. We extend JAX automatic differentiation library with an interface to Firedrake finite element library. High-level symbolic representation of PDEs allows bypassing differentiating through low-level possibly many iterations of the underlying nonlinear solvers. Differentiating through Firedrake solvers is done using tangent-linear and adjoint equations. This enables the efficient composition of finite element solvers with arbitrary differentiable programs. The code is available at github.com/IvanYashchuk/jax-firedrake.
    Gibbs-Duhem-Informed Neural Networks for Binary Activity Coefficient Prediction. (arXiv:2306.07937v2 [physics.chem-ph] UPDATED)
    We propose Gibbs-Duhem-informed neural networks for the prediction of binary activity coefficients at varying compositions. That is, we include the Gibbs-Duhem equation explicitly in the loss function for training neural networks, which is straightforward in standard machine learning (ML) frameworks enabling automatic differentiation. In contrast to recent hybrid ML approaches, our approach does not rely on embedding a specific thermodynamic model inside the neural network and corresponding prediction limitations. Rather, Gibbs-Duhem consistency serves as regularization, with the flexibility of ML models being preserved. Our results show increased thermodynamic consistency and generalization capabilities for activity coefficient predictions by Gibbs-Duhem-informed graph neural networks and matrix completion methods. We also find that the model architecture, particularly the activation function, can have a strong influence on the prediction quality. The approach can be easily extended to account for other thermodynamic consistency conditions.
    MELAGE: A purely python based Neuroimaging software (Neonatal). (arXiv:2309.07175v1 [eess.IV])
    MELAGE, a pioneering Python-based neuroimaging software, emerges as a versatile tool for the visualization, processing, and analysis of medical images. Initially conceived to address the unique challenges of processing 3D ultrasound and MRI brain images during the neonatal period, MELAGE exhibits remarkable adaptability, extending its utility to the domain of adult human brain imaging. At its core, MELAGE features a semi-automatic brain extraction tool empowered by a deep learning module, ensuring precise and efficient brain structure extraction from MRI and 3D Ultrasound data. Moreover, MELAGE offers a comprehensive suite of features, encompassing dynamic 3D visualization, accurate measurements, and interactive image segmentation. This transformative software holds immense promise for researchers and clinicians, offering streamlined image analysis, seamless integration with deep learning algorithms, and broad applicability in the realm of medical imaging.
    EmbodiedGPT: Vision-Language Pre-Training via Embodied Chain of Thought. (arXiv:2305.15021v2 [cs.RO] UPDATED)
    Embodied AI is a crucial frontier in robotics, capable of planning and executing action sequences for robots to accomplish long-horizon tasks in physical environments. In this work, we introduce EmbodiedGPT, an end-to-end multi-modal foundation model for embodied AI, empowering embodied agents with multi-modal understanding and execution capabilities. To achieve this, we have made the following efforts: (i) We craft a large-scale embodied planning dataset, termed EgoCOT. The dataset consists of carefully selected videos from the Ego4D dataset, along with corresponding high-quality language instructions. Specifically, we generate a sequence of sub-goals with the "Chain of Thoughts" mode for effective embodied planning. (ii) We introduce an efficient training approach to EmbodiedGPT for high-quality plan generation, by adapting a 7B large language model (LLM) to the EgoCOT dataset via prefix tuning. (iii) We introduce a paradigm for extracting task-related features from LLM-generated planning queries to form a closed loop between high-level planning and low-level control. Extensive experiments show the effectiveness of EmbodiedGPT on embodied tasks, including embodied planning, embodied control, visual captioning, and visual question answering. Notably, EmbodiedGPT significantly enhances the success rate of the embodied control task by extracting more effective features. It has achieved a remarkable 1.6 times increase in success rate on the Franka Kitchen benchmark and a 1.3 times increase on the Meta-World benchmark, compared to the BLIP-2 baseline fine-tuned with the Ego4D dataset.
    EpiDeNet: An Energy-Efficient Approach to Seizure Detection for Embedded Systems. (arXiv:2309.07135v1 [eess.SP])
    Epilepsy is a prevalent neurological disorder that affects millions of individuals globally, and continuous monitoring coupled with automated seizure detection appears as a necessity for effective patient treatment. To enable long-term care in daily-life conditions, comfortable and smart wearable devices with long battery life are required, which in turn set the demand for resource-constrained and energy-efficient computing solutions. In this context, the development of machine learning algorithms for seizure detection faces the challenge of heavily imbalanced datasets. This paper introduces EpiDeNet, a new lightweight seizure detection network, and Sensitivity-Specificity Weighted Cross-Entropy (SSWCE), a new loss function that incorporates sensitivity and specificity, to address the challenge of heavily unbalanced datasets. The proposed EpiDeNet-SSWCE approach demonstrates the successful detection of 91.16% and 92.00% seizure events on two different datasets (CHB-MIT and PEDESITE, respectively), with only four EEG channels. A three-window majority voting-based smoothing scheme combined with the SSWCE loss achieves 3x reduction of false positives to 1.18 FP/h. EpiDeNet is well suited for implementation on low-power embedded platforms, and we evaluate its performance on two ARM Cortex-based platforms (M4F/M7) and two parallel ultra-low power (PULP) systems (GAP8, GAP9). The most efficient implementation (GAP9) achieves an energy efficiency of 40 GMAC/s/W, with an energy consumption per inference of only 0.051 mJ at high performance (726.46 MMAC/s), outperforming the best ARM Cortex-based solutions by approximately 160x in energy efficiency. The EpiDeNet-SSWCE method demonstrates effective and accurate seizure detection performance on heavily imbalanced datasets, while being suited for implementation on energy-constrained platforms.
    Detecting Unknown Attacks in IoT Environments: An Open Set Classifier for Enhanced Network Intrusion Detection. (arXiv:2309.07461v1 [cs.CR])
    The widespread integration of Internet of Things (IoT) devices across all facets of life has ushered in an era of interconnectedness, creating new avenues for cybersecurity challenges and underscoring the need for robust intrusion detection systems. However, traditional security systems are designed with a closed-world perspective and often face challenges in dealing with the ever-evolving threat landscape, where new and unfamiliar attacks are constantly emerging. In this paper, we introduce a framework aimed at mitigating the open set recognition (OSR) problem in the realm of Network Intrusion Detection Systems (NIDS) tailored for IoT environments. Our framework capitalizes on image-based representations of packet-level data, extracting spatial and temporal patterns from network traffic. Additionally, we integrate stacking and sub-clustering techniques, enabling the identification of unknown attacks by effectively modeling the complex and diverse nature of benign behavior. The empirical results prominently underscore the framework's efficacy, boasting an impressive 88\% detection rate for previously unseen attacks when compared against existing approaches and recent advancements. Future work will perform extensive experimentation across various openness levels and attack scenarios, further strengthening the adaptability and performance of our proposed solution in safeguarding IoT environments.
    Choosing a Proxy Metric from Past Experiments. (arXiv:2309.07893v1 [stat.ME])
    In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.
    Attention-based Dynamic Graph Convolutional Recurrent Neural Network for Traffic Flow Prediction in Highway Transportation. (arXiv:2309.07196v1 [cs.LG])
    As one of the important tools for spatial feature extraction, graph convolution has been applied in a wide range of fields such as traffic flow prediction. However, current popular works of graph convolution cannot guarantee spatio-temporal consistency in a long period. The ignorance of correlational dynamics, convolutional locality and temporal comprehensiveness would limit predictive accuracy. In this paper, a novel Attention-based Dynamic Graph Convolutional Recurrent Neural Network (ADGCRNN) is proposed to improve traffic flow prediction in highway transportation. Three temporal resolutions of data sequence are effectively integrated by self-attention to extract characteristics; multi-dynamic graphs and their weights are dynamically created to compliantly combine the varying characteristics; a dedicated gated kernel emphasizing highly relative nodes is introduced on these complete graphs to reduce overfitting for graph convolution operations. Experiments on two public datasets show our work better than state-of-the-art baselines, and case studies of a real Web system prove practical benefit in highway transportation.
    User Training with Error Augmentation for Electromyogram-based Gesture Classification. (arXiv:2309.07289v1 [cs.HC])
    We designed and tested a system for real-time control of a user interface by extracting surface electromyographic (sEMG) activity from eight electrodes in a wrist-band configuration. sEMG data were streamed into a machine-learning algorithm that classified hand gestures in real-time. After an initial model calibration, participants were presented with one of three types of feedback during a human-learning stage: veridical feedback, in which predicted probabilities from the gesture classification algorithm were displayed without alteration, modified feedback, in which we applied a hidden augmentation of error to these probabilities, and no feedback. User performance was then evaluated in a series of minigames, in which subjects were required to use eight gestures to manipulate their game avatar to complete a task. Experimental results indicated that, relative to baseline, the modified feedback condition led to significantly improved accuracy and improved gesture class separation. These findings suggest that real-time feedback in a gamified user interface with manipulation of feedback may enable intuitive, rapid, and accurate task acquisition for sEMG-based gesture recognition applications.
    Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG])
    In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.
    Deep Spatiotemporal Clustering: A Temporal Clustering Approach for Multi-dimensional Climate Data. (arXiv:2304.14541v2 [cs.LG] UPDATED)
    Clustering high-dimensional spatiotemporal data using an unsupervised approach is a challenging problem for many data-driven applications. Existing state-of-the-art methods for unsupervised clustering use different similarity and distance functions but focus on either spatial or temporal features of the data. Concentrating on joint deep representation learning of spatial and temporal features, we propose Deep Spatiotemporal Clustering (DSC), a novel algorithm for the temporal clustering of high-dimensional spatiotemporal data using an unsupervised deep learning method. Inspired by the U-net architecture, DSC utilizes an autoencoder integrating CNN-RNN layers to learn latent representations of the spatiotemporal data. DSC also includes a unique layer for cluster assignment on latent representations that uses the Student's t-distribution. By optimizing the clustering loss and data reconstruction loss simultaneously, the algorithm gradually improves clustering assignments and the nonlinear mapping between low-dimensional latent feature space and high-dimensional original data space. A multivariate spatiotemporal climate dataset is used to evaluate the efficacy of the proposed method. Our extensive experiments show our approach outperforms both conventional and deep learning-based unsupervised clustering algorithms. Additionally, we compared the proposed model with its various variants (CNN encoder, CNN autoencoder, CNN-RNN encoder, CNN-RNN autoencoder, etc.) to get insight into using both the CNN and RNN layers in the autoencoder, and our proposed technique outperforms these variants in terms of clustering results.
    Short-term power load forecasting method based on CNN-SAEDN-Res. (arXiv:2309.07140v1 [eess.SP])
    In deep learning, the load data with non-temporal factors are difficult to process by sequence models. This problem results in insufficient precision of the prediction. Therefore, a short-term load forecasting method based on convolutional neural network (CNN), self-attention encoder-decoder network (SAEDN) and residual-refinement (Res) is proposed. In this method, feature extraction module is composed of a two-dimensional convolutional neural network, which is used to mine the local correlation between data and obtain high-dimensional data features. The initial load fore-casting module consists of a self-attention encoder-decoder network and a feedforward neural network (FFN). The module utilizes self-attention mechanisms to encode high-dimensional features. This operation can obtain the global correlation between data. Therefore, the model is able to retain important information based on the coupling relationship between the data in data mixed with non-time series factors. Then, self-attention decoding is per-formed and the feedforward neural network is used to regression initial load. This paper introduces the residual mechanism to build the load optimization module. The module generates residual load values to optimize the initial load. The simulation results show that the proposed load forecasting method has advantages in terms of prediction accuracy and prediction stability.  ( 2 min )
    TargetCall: Eliminating the Wasted Computation in Basecalling via Pre-Basecalling Filtering. (arXiv:2212.04953v2 [q-bio.GN] UPDATED)
    Basecalling is an essential step in nanopore sequencing analysis where the raw signals of nanopore sequencers are converted into nucleotide sequences, i.e., reads. State-of-the-art basecallers employ complex deep learning models to achieve high basecalling accuracy. This makes basecalling computationally-inefficient and memory-hungry; bottlenecking the entire genome analysis pipeline. However, for many applications, the majority of reads do no match the reference genome of interest (i.e., target reference) and thus are discarded in later steps in the genomics pipeline, wasting the basecalling computation. To overcome this issue, we propose TargetCall, the first pre-basecalling filter to eliminate the wasted computation in basecalling. TargetCall's key idea is to discard reads that will not match the target reference (i.e., off-target reads) prior to basecalling. TargetCall consists of two main components: (1) LightCall, a lightweight neural network basecaller that produces noisy reads; and (2) Similarity Check, which labels each of these noisy reads as on-target or off-target by matching them to the target reference. TargetCall aims to filter out all off-target reads before basecalling. The highly-accurate but slow basecalling is performed only on the raw signals whose noisy reads are labeled as on-target. Our thorough experimental evaluations using both real and simulated data show that TargetCall 1) improves the end-to-end basecalling performance while maintaining high sensitivity in keeping on-target reads, 2) maintains high accuracy in downstream analysis, 3) precisely filters out up to 94.71% of off-target reads, and 4) achieves better performance, throughput, sensitivity, precision, and generality compared to prior works. We open-source TargetCall at https://github.com/CMU-SAFARI/TargetCall  ( 3 min )
    Contrastive Tuning: A Little Help to Make Masked Autoencoders Forget. (arXiv:2304.10520v2 [cs.CV] UPDATED)
    Masked Image Modeling (MIM) methods, like Masked Autoencoders (MAE), efficiently learn a rich representation of the input. However, for adapting to downstream tasks, they require a sufficient amount of labeled data since their rich features code not only objects but also less relevant image background. In contrast, Instance Discrimination (ID) methods focus on objects. In this work, we study how to combine the efficiency and scalability of MIM with the ability of ID to perform downstream classification in the absence of large amounts of labeled data. To this end, we introduce Masked Autoencoder Contrastive Tuning (MAE-CT), a sequential approach that utilizes the implicit clustering of the Nearest Neighbor Contrastive Learning (NNCLR) objective to induce abstraction in the topmost layers of a pre-trained MAE. MAE-CT tunes the rich features such that they form semantic clusters of objects without using any labels. Notably, MAE-CT does not rely on hand-crafted augmentations and frequently achieves its best performances while using only minimal augmentations (crop & flip). Further, MAE-CT is compute efficient as it requires at most 10% overhead compared to MAE re-training. Applied to large and huge Vision Transformer (ViT) models, MAE-CT excels over previous self-supervised methods trained on ImageNet in linear probing, k-NN and low-shot classification accuracy as well as in unsupervised clustering accuracy. With ViT-H/16 MAE-CT achieves a new state-of-the-art in linear probing of 82.2%.  ( 3 min )
    All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph])
    Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.  ( 2 min )
    TempEE: Temporal-Spatial Parallel Transformer for Radar Echo Extrapolation Beyond Auto-Regression. (arXiv:2304.14131v2 [eess.SP] UPDATED)
    Meteorological radar reflectivity data (i.e. radar echo) significantly influences precipitation prediction. It can facilitate accurate and expeditious forecasting of short-term heavy rainfall bypassing the need for complex Numerical Weather Prediction (NWP) models. In comparison to conventional models, Deep Learning (DL)-based radar echo extrapolation algorithms exhibit higher effectiveness and efficiency. Nevertheless, the development of reliable and generalized echo extrapolation algorithm is impeded by three primary challenges: cumulative error spreading, imprecise representation of sparsely distributed echoes, and inaccurate description of non-stationary motion processes. To tackle these challenges, this paper proposes a novel radar echo extrapolation algorithm called Temporal-Spatial Parallel Transformer, referred to as TempEE. TempEE avoids using auto-regression and instead employs a one-step forward strategy to prevent cumulative error spreading during the extrapolation process. Additionally, we propose the incorporation of a Multi-level Temporal-Spatial Attention mechanism to improve the algorithm's capability of capturing both global and local information while emphasizing task-related regions, including sparse echo representations, in an efficient manner. Furthermore, the algorithm extracts spatio-temporal representations from continuous echo images using a parallel encoder to model the non-stationary motion process for echo extrapolation. The superiority of our TempEE has been demonstrated in the context of the classic radar echo extrapolation task, utilizing a real-world dataset. Extensive experiments have further validated the efficacy and indispensability of various components within TempEE.  ( 3 min )
    Survival Estimation for Missing not at Random Censoring Indicators based on Copula Models. (arXiv:2009.01726v2 [stat.ML] UPDATED)
    In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.  ( 2 min )
    Turning Dross Into Gold Loss: is BERT4Rec really better than SASRec?. (arXiv:2309.07602v1 [cs.IR])
    Recently sequential recommendations and next-item prediction task has become increasingly popular in the field of recommender systems. Currently, two state-of-the-art baselines are Transformer-based models SASRec and BERT4Rec. Over the past few years, there have been quite a few publications comparing these two algorithms and proposing new state-of-the-art models. In most of the publications, BERT4Rec achieves better performance than SASRec. But BERT4Rec uses cross-entropy over softmax for all items, while SASRec uses negative sampling and calculates binary cross-entropy loss for one positive and one negative item. In our work, we show that if both models are trained with the same loss, which is used by BERT4Rec, then SASRec will significantly outperform BERT4Rec both in terms of quality and training speed. In addition, we show that SASRec could be effectively trained with negative sampling and still outperform BERT4Rec, but the number of negative examples should be much larger than one.  ( 2 min )
    Solving Recurrence Relations using Machine Learning, with Application to Cost Analysis. (arXiv:2309.07259v1 [cs.PL])
    Automatic static cost analysis infers information about the resources used by programs without actually running them with concrete data, and presents such information as functions of input data sizes. Most of the analysis tools for logic programs (and other languages) are based on setting up recurrence relations representing (bounds on) the computational cost of predicates, and solving them to find closed-form functions that are equivalent to (or a bound on) them. Such recurrence solving is a bottleneck in current tools: many of the recurrences that arise during the analysis cannot be solved with current solvers, such as Computer Algebra Systems (CASs), so that specific methods for different classes of recurrences need to be developed. We address such a challenge by developing a novel, general approach for solving arbitrary, constrained recurrence relations, that uses machine-learning sparse regression techniques to guess a candidate closed-form function, and a combination of an SMT-solver and a CAS to check whether such function is actually a solution of the recurrence. We have implemented a prototype and evaluated it with recurrences generated by a cost analysis system (the one in CiaoPP). The experimental results are quite promising, showing that our approach can find closed-form solutions, in a reasonable time, for classes of recurrences that cannot be solved by such a system, nor by current CASs.  ( 3 min )
    Communication Efficient Private Federated Learning Using Dithering. (arXiv:2309.07809v1 [cs.LG])
    The task of preserving privacy while ensuring efficient communication is a fundamental challenge in federated learning. In this work, we tackle this challenge in the trusted aggregator model, and propose a solution that achieves both objectives simultaneously. We show that employing a quantization scheme based on subtractive dithering at the clients can effectively replicate the normal noise addition process at the aggregator. This implies that we can guarantee the same level of differential privacy against other clients while substantially reducing the amount of communication required, as opposed to transmitting full precision gradients and using central noise addition. We also experimentally demonstrate that the accuracy of our proposed approach matches that of the full precision gradient method.  ( 2 min )
    Hodge-Aware Contrastive Learning. (arXiv:2309.07364v1 [cs.LG])
    Simplicial complexes prove effective in modeling data with multiway dependencies, such as data defined along the edges of networks or within other higher-order structures. Their spectrum can be decomposed into three interpretable subspaces via the Hodge decomposition, resulting foundational in numerous applications. We leverage this decomposition to develop a contrastive self-supervised learning approach for processing simplicial data and generating embeddings that encapsulate specific spectral information.Specifically, we encode the pertinent data invariances through simplicial neural networks and devise augmentations that yield positive contrastive examples with suitable spectral properties for downstream tasks. Additionally, we reweight the significance of negative examples in the contrastive loss, considering the similarity of their Hodge components to the anchor. By encouraging a stronger separation among less similar instances, we obtain an embedding space that reflects the spectral properties of the data. The numerical results on two standard edge flow classification tasks show a superior performance even when compared to supervised learning techniques. Our findings underscore the importance of adopting a spectral perspective for contrastive learning with higher-order data.  ( 2 min )
    Variational Quantum Linear Solver enhanced Quantum Support Vector Machine. (arXiv:2309.07770v1 [quant-ph])
    Quantum Support Vector Machines (QSVM) play a vital role in using quantum resources for supervised machine learning tasks, such as classification. However, current methods are strongly limited in terms of scalability on Noisy Intermediate Scale Quantum (NISQ) devices. In this work, we propose a novel approach called the Variational Quantum Linear Solver (VQLS) enhanced QSVM. This is built upon our idea of utilizing the variational quantum linear solver to solve system of linear equations of a least squares-SVM on a NISQ device. The implementation of our approach is evaluated by an extensive series of numerical experiments with the Iris dataset, which consists of three distinct iris plant species. Based on this, we explore the practicality and effectiveness of our algorithm by constructing a classifier capable of classification in a feature space ranging from one to seven dimensions. Furthermore, by strategically exploiting both classical and quantum computing for various subroutines of our algorithm, we effectively mitigate practical challenges associated with the implementation. These include significant improvement in the trainability of the variational ansatz and notable reductions in run-time for cost calculations. Based on the numerical experiments, our approach exhibits the capability of identifying a separating hyperplane in an 8-dimensional feature space. Moreover, it consistently demonstrated strong performance across various instances with the same dataset.  ( 2 min )
    When Are Two Lists Better than One?: Benefits and Harms in Joint Decision-making. (arXiv:2308.11721v2 [cs.LG] UPDATED)
    Historically, much of machine learning research has focused on the performance of the algorithm alone, but recently more attention has been focused on optimizing joint human-algorithm performance. Here, we analyze a specific type of human-algorithm collaboration where the algorithm has access to a set of $n$ items, and presents a subset of size $k$ to the human, who selects a final item from among those $k$. This scenario could model content recommendation, route planning, or any type of labeling task. Because both the human and algorithm have imperfect, noisy information about the true ordering of items, the key question is: which value of $k$ maximizes the probability that the best item will be ultimately selected? For $k=1$, performance is optimized by the algorithm acting alone, and for $k=n$ it is optimized by the human acting alone. Surprisingly, we show that for multiple of noise models, it is optimal to set $k \in [2, n-1]$ - that is, there are strict benefits to collaborating, even when the human and algorithm have equal accuracy separately. We demonstrate this theoretically for the Mallows model and experimentally for the Random Utilities models of noisy permutations. However, we show this pattern is reversed when the human is anchored on the algorithm's presented ordering - the joint system always has strictly worse performance. We extend these results to the case where the human and algorithm differ in their accuracy levels, showing that there always exist regimes where a more accurate agent would strictly benefit from collaborating with a less accurate one, but these regimes are asymmetric between the human and the algorithm's accuracy.  ( 3 min )
    Predicting Survival Time of Ball Bearings in the Presence of Censoring. (arXiv:2309.07188v1 [eess.SP])
    Ball bearings find widespread use in various manufacturing and mechanical domains, and methods based on machine learning have been widely adopted in the field to monitor wear and spot defects before they lead to failures. Few studies, however, have addressed the problem of censored data, in which failure is not observed. In this paper, we propose a novel approach to predict the time to failure in ball bearings using survival analysis. First, we analyze bearing data in the frequency domain and annotate when a bearing fails by comparing the Kullback-Leibler divergence and the standard deviation between its break-in frequency bins and its break-out frequency bins. Second, we train several survival models to estimate the time to failure based on the annotated data and covariates extracted from the time domain, such as skewness, kurtosis and entropy. The models give a probabilistic prediction of risk over time and allow us to compare the survival function between groups of bearings. We demonstrate our approach on the XJTU and PRONOSTIA datasets. On XJTU, the best result is a 0.70 concordance-index and 0.21 integrated Brier score. On PRONOSTIA, the best is a 0.76 concordance-index and 0.19 integrated Brier score. Our work motivates further work on incorporating censored data in models for predictive maintenance.  ( 2 min )
    On a continuous time model of gradient descent dynamics and instability in deep learning. (arXiv:2302.01952v3 [stat.ML] UPDATED)
    The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.  ( 2 min )
    Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck. (arXiv:2309.07200v1 [cs.LG])
    Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.  ( 2 min )
    Physically Plausible Full-Body Hand-Object Interaction Synthesis. (arXiv:2309.07907v1 [cs.RO])
    We propose a physics-based method for synthesizing dexterous hand-object interactions in a full-body setting. While recent advancements have addressed specific facets of human-object interactions, a comprehensive physics-based approach remains a challenge. Existing methods often focus on isolated segments of the interaction process and rely on data-driven techniques that may result in artifacts. In contrast, our proposed method embraces reinforcement learning (RL) and physics simulation to mitigate the limitations of data-driven approaches. Through a hierarchical framework, we first learn skill priors for both body and hand movements in a decoupled setting. The generic skill priors learn to decode a latent skill embedding into the motion of the underlying part. A high-level policy then controls hand-object interactions in these pretrained latent spaces, guided by task objectives of grasping and 3D target trajectory following. It is trained using a novel reward function that combines an adversarial style term with a task reward, encouraging natural motions while fulfilling the task incentives. Our method successfully accomplishes the complete interaction task, from approaching an object to grasping and subsequent manipulation. We compare our approach against kinematics-based baselines and show that it leads to more physically plausible motions.  ( 2 min )
    VerilogEval: Evaluating Large Language Models for Verilog Code Generation. (arXiv:2309.07544v1 [cs.LG])
    The increasing popularity of large language models (LLMs) has paved the way for their application in diverse domains. This paper proposes a benchmarking framework tailored specifically for evaluating LLM performance in the context of Verilog code generation for hardware design and verification. We present a comprehensive evaluation dataset consisting of 156 problems from the Verilog instructional website HDLBits. The evaluation set consists of a diverse set of Verilog code generation tasks, ranging from simple combinational circuits to complex finite state machines. The Verilog code completions can be automatically tested for functional correctness by comparing the transient simulation outputs of the generated design with a golden solution. We also demonstrate that the Verilog code generation capability of pretrained language models could be improved with supervised fine-tuning by bootstrapping with LLM generated synthetic problem-code pairs.  ( 2 min )
    Machine Learning and Computer Vision Techniques in Continuous Beehive Monitoring Applications: A survey. (arXiv:2208.00085v3 [cs.CV] UPDATED)
    Wide use and availability of the machine learning and computer vision techniques allows development of relatively complex monitoring systems in many domains. Besides the traditional industrial domain, new application appears also in biology and agriculture, where we could speak about the detection of infections, parasites and weeds, but also about automated monitoring and early warning systems. This is also connected with the introduction of the easily accessible hardware and development kits such as Arduino, or RaspberryPi family. In this paper, we survey 50 existing papers focusing on the methods of automated beehive monitoring methods using the computer vision techniques, particularly on the pollen and Varroa mite detection together with the bee traffic monitoring. Such systems could also be used for the monitoring of the honeybee colonies and for the inspection of their health state, which could identify potentially dangerous states before the situation is critical, or to better plan periodic bee colony inspections and therefore save significant costs. Later, we also include analysis of the research trends in this application field and we outline the possible direction of the new explorations. Our paper is aimed also at veterinary and apidology professionals and experts, who might not be familiar with machine learning to introduce them to its possibilities, therefore each family of applications is opened by a brief theoretical introduction and motivation related to its base method. We hope that this paper will inspire other scientists to use machine learning techniques for other applications in beehive monitoring.  ( 3 min )
    Kernel Conditional Moment Constraints for Confounding Robust Inference. (arXiv:2302.13348v2 [stat.ML] UPDATED)
    We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.  ( 2 min )
    EarthPT: a foundation model for Earth Observation. (arXiv:2309.07207v1 [cs.LG])
    We introduce EarthPT -- an Earth Observation (EO) pretrained transformer. EarthPT is a 700 million parameter decoding transformer foundation model trained in an autoregressive self-supervised manner and developed specifically with EO use-cases in mind. We demonstrate that EarthPT is an effective forecaster that can accurately predict future pixel-level surface reflectances across the 400-2300 nm range well into the future. For example, forecasts of the evolution of the Normalised Difference Vegetation Index (NDVI) have a typical error of approximately 0.05 (over a natural range of -1 -> 1) at the pixel level over a five month test set horizon, out-performing simple phase-folded models based on historical averaging. We also demonstrate that embeddings learnt by EarthPT hold semantically meaningful information and could be exploited for downstream tasks such as highly granular, dynamic land use classification. Excitingly, we note that the abundance of EO data provides us with -- in theory -- quadrillions of training tokens. Therefore, if we assume that EarthPT follows neural scaling laws akin to those derived for Large Language Models (LLMs), there is currently no data-imposed limit to scaling EarthPT and other similar `Large Observation Models.'  ( 2 min )
    Learning Quasi-Static 3D Models of Markerless Deformable Linear Objects for Bimanual Robotic Manipulation. (arXiv:2309.07609v1 [cs.RO])
    The robotic manipulation of Deformable Linear Objects (DLOs) is a vital and challenging task that is important in many practical applications. Classical model-based approaches to this problem require an accurate model to capture how robot motions affect the deformation of the DLO. Nowadays, data-driven models offer the best tradeoff between quality and computation time. This paper analyzes several learning-based 3D models of the DLO and proposes a new one based on the Transformer architecture that achieves superior accuracy, even on the DLOs of different lengths, thanks to the proposed scaling method. Moreover, we introduce a data augmentation technique, which improves the prediction performance of almost all considered DLO data-driven models. Thanks to this technique, even a simple Multilayer Perceptron (MLP) achieves close to state-of-the-art performance while being significantly faster to evaluate. In the experiments, we compare the performance of the learning-based 3D models of the DLO on several challenging datasets quantitatively and demonstrate their applicability in the task of shaping a DLO.  ( 2 min )
    PolicyCleanse: Backdoor Detection and Mitigation in Reinforcement Learning. (arXiv:2202.03609v5 [cs.LG] UPDATED)
    While real-world applications of reinforcement learning are becoming popular, the security and robustness of RL systems are worthy of more attention and exploration. In particular, recent works have revealed that, in a multi-agent RL environment, backdoor trigger actions can be injected into a victim agent (a.k.a. Trojan agent), which can result in a catastrophic failure as soon as it sees the backdoor trigger action. To ensure the security of RL agents against malicious backdoors, in this work, we propose the problem of Backdoor Detection in a multi-agent competitive reinforcement learning system, with the objective of detecting Trojan agents as well as the corresponding potential trigger actions, and further trying to mitigate their Trojan behavior. In order to solve this problem, we propose PolicyCleanse that is based on the property that the activated Trojan agents accumulated rewards degrade noticeably after several timesteps. Along with PolicyCleanse, we also design a machine unlearning-based approach that can effectively mitigate the detected backdoor. Extensive experiments demonstrate that the proposed methods can accurately detect Trojan agents, and outperform existing backdoor mitigation baseline approaches by at least 3% in winning rate across various types of agents and environments.  ( 3 min )
    Learning From Drift: Federated Learning on Non-IID Data via Drift Regularization. (arXiv:2309.07189v1 [cs.LG])
    Federated learning algorithms perform reasonably well on independent and identically distributed (IID) data. They, on the other hand, suffer greatly from heterogeneous environments, i.e., Non-IID data. Despite the fact that many research projects have been done to address this issue, recent findings indicate that they are still sub-optimal when compared to training on IID data. In this work, we carefully analyze the existing methods in heterogeneous environments. Interestingly, we find that regularizing the classifier's outputs is quite effective in preventing performance degradation on Non-IID data. Motivated by this, we propose Learning from Drift (LfD), a novel method for effectively training the model in heterogeneous settings. Our scheme encapsulates two key components: drift estimation and drift regularization. Specifically, LfD first estimates how different the local model is from the global model (i.e., drift). The local model is then regularized such that it does not fall in the direction of the estimated drift. In the experiment, we evaluate each method through the lens of the five aspects of federated learning, i.e., Generalization, Heterogeneity, Scalability, Forgetting, and Efficiency. Comprehensive evaluation results clearly support the superiority of LfD in federated learning with Non-IID data.  ( 2 min )
    Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech. (arXiv:2110.12539v3 [cs.SD] UPDATED)
    We present a Split Vector Quantized Variational Autoencoder (SVQ-VAE) architecture using a split vector quantizer for NTTS, as an enhancement to the well-known Variational Autoencoder (VAE) and Vector Quantized Variational Autoencoder (VQ-VAE) architectures. Compared to these previous architectures, our proposed model retains the benefits of using an utterance-level bottleneck, while keeping significant representation power and a discretized latent space small enough for efficient prediction from text. We train the model on recordings in the expressive task-oriented dialogues domain and show that SVQ-VAE achieves a statistically significant improvement in naturalness over the VAE and VQ-VAE models. Furthermore, we demonstrate that the SVQ-VAE latent acoustic space is predictable from text, reducing the gap between the standard constant vector synthesis and vocoded recordings by 32%.  ( 2 min )
    Using wearable device-based machine learning models to autonomously identify older adults with poor cognition. (arXiv:2309.07133v1 [eess.SP])
    Conducting cognitive tests is time-consuming for patients and clinicians. Wearable device-based prediction models allow for continuous health monitoring under normal living conditions and could offer an alternative to identifying older adults with cognitive impairments for early interventions. In this study, we first derived novel wearable-based features related to circadian rhythms, ambient light exposure, physical activity levels, sleep, and signal processing. Then, we quantified the ability of wearable-based machine-learning models to predict poor cognition based on outcomes from the Digit Symbol Substitution Test (DSST), the Consortium to Establish a Registry for Alzheimers Disease Word-Learning subtest (CERAD-WL), and the Animal Fluency Test (AFT). We found that the wearable-based models had significantly higher AUCs when predicting all three cognitive outcomes compared to benchmark models containing age, sex, education, marital status, household income, diabetic status, depression symptoms, and functional independence scores. In addition to uncovering previously unidentified wearable-based features that are predictive of poor cognition such as the standard deviation of the midpoints of each persons most active 10-hour periods and least active 5-hour periods, our paper provides proof-of-concept that wearable-based machine learning models can be used to autonomously screen older adults for possible cognitive impairments. Such models offer cost-effective alternatives to conducting initial screenings manually in clinical settings.  ( 2 min )
    MAHTM: A Multi-Agent Framework for Hierarchical Transactive Microgrids. (arXiv:2303.08447v2 [cs.LG] UPDATED)
    Integrating variable renewable energy into the grid has posed challenges to system operators in achieving optimal trade-offs among energy availability, cost affordability, and pollution controllability. This paper proposes a multi-agent reinforcement learning framework for managing energy transactions in microgrids. The framework addresses the challenges above: it seeks to optimize the usage of available resources by minimizing the carbon footprint while benefiting all stakeholders. The proposed architecture consists of three layers of agents, each pursuing different objectives. The first layer, comprised of prosumers and consumers, minimizes the total energy cost. The other two layers control the energy price to decrease the carbon impact while balancing the consumption and production of both renewable and conventional energy. This framework also takes into account fluctuations in energy demand and supply.  ( 2 min )
    BAFFLE: Backdoor Attack in Offline Reinforcement Learning. (arXiv:2210.04688v2 [cs.LG] UPDATED)
    A growing body of research has focused on the Reinforcement Learning (RL) methods which allow the agent to learn from trial-and-error experiences gathered during the interaction with the environment. Recently, offline RL becomes a popular RL paradigm because it saves the interactions with environments. In offline RL, data providers share large pre-collected datasets, and others can train high-quality agents without interacting with the environments. This paradigm has demonstrated effectiveness in critical tasks like robot control, autonomous driving, etc. However, less attention is paid to investigating the security threats to the offline RL system. This paper focuses on backdoor attacks, where some perturbations are added to the data (observations) such that given normal observations, the agent takes high-rewards actions, and low-reward actions on observations injected with triggers. In this paper, we propose Baffle (Backdoor Attack for Offline Reinforcement Learning), an approach that automatically implants backdoors to RL agents by poisoning the offline RL dataset, and evaluate how different offline RL algorithms react to this attack. Our experiments conducted on four tasks and four offline RL algorithms expose a disquieting fact: none of the existing offline RL algorithms is immune to such a backdoor attack. Baffle modifies $10\%$ of the datasets for four tasks. Agents trained on the poisoned datasets perform well in normal settings. However, when triggers are presented, the agents' performance decreases drastically by $63.2\%$, $53.9\%$, $64.7\%$, and $47.4\%$ in the four tasks on average. The backdoor still persists after fine-tuning poisoned agents on clean datasets. We further show that the inserted backdoor is also hard to be detected by a popular defensive method. This paper calls attention to developing more effective protection for the open-source offline RL dataset.  ( 3 min )
    Gaussian Process Surrogate Models for Neural Networks. (arXiv:2208.06028v2 [cs.LG] UPDATED)
    Not being able to understand and predict the behavior of deep learning systems makes it hard to decide what architecture and algorithm to use for a given problem. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler, more interpretable surrogate. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving kernels for infinite neural networks, we learn kernels empirically from the naturalistic behavior of finite neural networks. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems such as identifying which points most influence the behavior of specific neural networks and predicting which architectures and algorithms will generalize well for specific datasets.  ( 2 min )
    Some notes concerning a generalized KMM-type optimization method for density ratio estimation. (arXiv:2309.07887v1 [cs.LG])
    In the present paper we introduce new optimization algorithms for the task of density ratio estimation. More precisely, we consider extending the well-known KMM method using the construction of a suitable loss function, in order to encompass more general situations involving the estimation of density ratio with respect to subsets of the training data and test data, respectively. The associated codes can be found at https://github.com/CDAlecsa/Generalized-KMM.  ( 2 min )
    Correcting sampling biases via importance reweighting for spatial modeling. (arXiv:2309.04824v2 [cs.LG] UPDATED)
    In machine learning models, the estimation of errors is often complex due to distribution bias, particularly in spatial data such as those found in environmental studies. We introduce an approach based on the ideas of importance sampling to obtain an unbiased estimate of the target error. By taking into account difference between desirable error and available data, our method reweights errors at each sample point and neutralizes the shift. Importance sampling technique and kernel density estimation were used for reweighteing. We validate the effectiveness of our approach using artificial data that resemble real-world spatial datasets. Our findings demonstrate advantages of the proposed approach for the estimation of the target error, offering a solution to a distribution shift problem. Overall error of predictions dropped from 7% to just 2% and it gets smaller for larger samples.  ( 2 min )
    Domain Generalization for Crop Segmentation with Knowledge Distillation. (arXiv:2304.01029v2 [cs.CV] UPDATED)
    In recent years, precision agriculture has gradually oriented farming closer to automation processes to support all the activities related to field management. Service robotics plays a predominant role in this evolution by deploying autonomous agents that can navigate fields while performing tasks without human intervention, such as monitoring, spraying, and harvesting. To execute these precise actions, mobile robots need a real-time perception system that understands their surroundings and identifies their targets in the wild. Generalizing to new crops and environmental conditions is critical for practical applications, as labeled samples are rarely available. In this paper, we investigate the problem of crop segmentation and propose a novel approach to enhance domain generalization using knowledge distillation. In the proposed framework, we transfer knowledge from an ensemble of models individually trained on source domains to a student model that can adapt to unseen target domains. To evaluate the proposed method, we present a synthetic multi-domain dataset for crop segmentation containing plants of variegate shapes and covering different terrain styles, weather conditions, and light scenarios for more than 50,000 samples. We demonstrate significant improvements in performance over state-of-the-art methods and superior sim-to-real generalization. Our approach provides a promising solution for domain generalization in crop segmentation and has the potential to enhance a wide variety of precision agriculture applications.  ( 2 min )
    Adaptive approximation of monotone functions. (arXiv:2309.07530v1 [cs.LG])
    We study the classical problem of approximating a non-decreasing function $f: \mathcal{X} \to \mathcal{Y}$ in $L^p(\mu)$ norm by sequentially querying its values, for known compact real intervals $\mathcal{X}$, $\mathcal{Y}$ and a known probability measure $\mu$ on $\cX$. For any function~$f$ we characterize the minimum number of evaluations of $f$ that algorithms need to guarantee an approximation $\hat{f}$ with an $L^p(\mu)$ error below $\epsilon$ after stopping. Unlike worst-case results that hold uniformly over all $f$, our complexity measure is dependent on each specific function $f$. To address this problem, we introduce GreedyBox, a generalization of an algorithm originally proposed by Novak (1992) for numerical integration. We prove that GreedyBox achieves an optimal sample complexity for any function $f$, up to logarithmic factors. Additionally, we uncover results regarding piecewise-smooth functions. Perhaps as expected, the $L^p(\mu)$ error of GreedyBox decreases much faster for piecewise-$C^2$ functions than predicted by the algorithm (without any knowledge on the smoothness of $f$). A simple modification even achieves optimal minimax approximation rates for such functions, which we compute explicitly. In particular, our findings highlight multiple performance gaps between adaptive and non-adaptive algorithms, smooth and piecewise-smooth functions, as well as monotone or non-monotone functions. Finally, we provide numerical experiments to support our theoretical results.  ( 2 min )
    ConSpec: honing in on critical steps for rapid learning and generalization in RL. (arXiv:2210.05845v5 [cs.LG] UPDATED)
    In real life, success is often contingent upon multiple critical steps that are distant in time from each other and from the final reward. These critical steps are challenging to identify with traditional reinforcement learning (RL) methods that rely on the Bellman equation for credit assignment. Here, we present a new RL algorithm that uses offline contrastive learning to hone in on critical steps. This algorithm, which we call contrastive introspection (ConSpec), can be added to any existing RL algorithm. ConSpec learns a set of prototypes for the critical steps in a task by a novel contrastive loss and delivers an intrinsic reward when the current state matches one of these prototypes. The prototypes in ConSpec provide two key benefits for credit assignment: (1) They enable rapid identification of all the critical steps. (2) They do so in a readily interpretable manner, enabling out-of-distribution generalization when sensory features are altered. Distinct from other contemporary RL approaches to credit assignment, ConSpec takes advantage of the fact that it is easier to retrospectively identify the small set of steps that success is contingent upon than it is to prospectively predict reward at every step taken in the environment. Altogether, ConSpec improves learning in a diverse set of RL tasks, including both those with explicit, discrete critical steps and those with complex, continuous critical steps.  ( 3 min )
    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time. (arXiv:2309.07418v1 [cs.DS])
    Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.  ( 3 min )
    HurriCast: An Automatic Framework Using Machine Learning and Statistical Modeling for Hurricane Forecasting. (arXiv:2309.07174v1 [cs.LG])
    Hurricanes present major challenges in the U.S. due to their devastating impacts. Mitigating these risks is important, and the insurance industry is central in this effort, using intricate statistical models for risk assessment. However, these models often neglect key temporal and spatial hurricane patterns and are limited by data scarcity. This study introduces a refined approach combining the ARIMA model and K-MEANS to better capture hurricane trends, and an Autoencoder for enhanced hurricane simulations. Our experiments show that this hybrid methodology effectively simulate historical hurricane behaviors while providing detailed projections of potential future trajectories and intensities. Moreover, by leveraging a comprehensive yet selective dataset, our simulations enrich the current understanding of hurricane patterns and offer actionable insights for risk management strategies.  ( 2 min )
    The kernel-balanced equation for deep neural networks. (arXiv:2309.07367v1 [cond-mat.dis-nn])
    Deep neural networks have shown many fruitful applications in this decade. A network can get the generalized function through training with a finite dataset. The degree of generalization is a realization of the proximity scale in the data space. Specifically, the scale is not clear if the dataset is complicated. Here we consider a network for the distribution estimation of the dataset. We show the estimation is unstable and the instability depends on the data density and training duration. We derive the kernel-balanced equation, which gives a short phenomenological description of the solution. The equation tells us the reason for the instability and the mechanism of the scale. The network outputs a local average of the dataset as a prediction and the scale of averaging is determined along the equation. The scale gradually decreases along training and finally results in instability in our case.  ( 2 min )
    LambdaKG: A Library for Pre-trained Language Model-Based Knowledge Graph Embeddings. (arXiv:2210.00305v3 [cs.CL] UPDATED)
    Knowledge Graphs (KGs) often have two characteristics: heterogeneous graph structure and text-rich entity/relation information. Text-based KG embeddings can represent entities by encoding descriptions with pre-trained language models, but no open-sourced library is specifically designed for KGs with PLMs at present. In this paper, we present LambdaKG, a library for KGE that equips with many pre-trained language models (e.g., BERT, BART, T5, GPT-3), and supports various tasks (e.g., knowledge graph completion, question answering, recommendation, and knowledge probing). LambdaKG is publicly open-sourced at https://github.com/zjunlp/PromptKG/tree/main/lambdaKG, with a demo video at this http URL and long-term maintenance.  ( 2 min )
    Model-free Learning of Regions of Attraction via Recurrent Sets. (arXiv:2204.10372v2 [cs.LG] UPDATED)
    We consider the problem of learning an inner approximation of the region of attraction (ROA) of an asymptotically stable equilibrium point without an explicit model of the dynamics. Rather than leveraging approximate models with bounded uncertainty to find a (robust) invariant set contained in the ROA, we propose to learn sets that satisfy a more relaxed notion of containment known as recurrence. We define a set to be $\tau$-recurrent (resp. $k$-recurrent) if every trajectory that starts within the set, returns to it after at most $\tau$ seconds (resp. $k$ steps). We show that under mild assumptions a $\tau$-recurrent set containing a stable equilibrium must be a subset of its ROA. We then leverage this property to develop algorithms that compute inner approximations of the ROA using counter-examples of recurrence that are obtained by sampling finite-length trajectories. Our algorithms process samples sequentially, which allow them to continue being executed even after an initial offline training stage. We further provide an upper bound on the number of counter-examples used by the algorithm, and almost sure convergence guarantees.  ( 2 min )
    Neural Categorical Priors for Physics-Based Character Control. (arXiv:2308.07200v2 [cs.GR] UPDATED)
    Recent advances in learning reusable motion priors have demonstrated their effectiveness in generating naturalistic behaviors. In this paper, we propose a new learning framework in this paradigm for controlling physics-based characters with significantly improved motion quality and diversity over existing state-of-the-art methods. The proposed method uses reinforcement learning (RL) to initially track and imitate life-like movements from unstructured motion clips using the discrete information bottleneck, as adopted in the Vector Quantized Variational AutoEncoder (VQ-VAE). This structure compresses the most relevant information from the motion clips into a compact yet informative latent space, i.e., a discrete space over vector quantized codes. By sampling codes in the space from a trained categorical prior distribution, high-quality life-like behaviors can be generated, similar to the usage of VQ-VAE in computer vision. Although this prior distribution can be trained with the supervision of the encoder's output, it follows the original motion clip distribution in the dataset and could lead to imbalanced behaviors in our setting. To address the issue, we further propose a technique named prior shifting to adjust the prior distribution using curiosity-driven RL. The outcome distribution is demonstrated to offer sufficient behavioral diversity and significantly facilitates upper-level policy learning for downstream tasks. We conduct comprehensive experiments using humanoid characters on two challenging downstream tasks, sword-shield striking and two-player boxing game. Our results demonstrate that the proposed framework is capable of controlling the character to perform considerably high-quality movements in terms of behavioral strategies, diversity, and realism. Videos, codes, and data are available at https://tencent-roboticsx.github.io/NCP/.  ( 3 min )
    Feature Engineering in Learning-to-Rank for Community Question Answering Task. (arXiv:2309.07610v1 [cs.LG])
    Community question answering (CQA) forums are Internet-based platforms where users ask questions about a topic and other expert users try to provide solutions. Many CQA forums such as Quora, Stackoverflow, Yahoo!Answer, StackExchange exist with a lot of user-generated data. These data are leveraged in automated CQA ranking systems where similar questions (and answers) are presented in response to the query of the user. In this work, we empirically investigate a few aspects of this domain. Firstly, in addition to traditional features like TF-IDF, BM25 etc., we introduce a BERT-based feature that captures the semantic similarity between the question and answer. Secondly, most of the existing research works have focused on features extracted only from the question part; features extracted from answers have not been explored extensively. We combine both types of features in a linear fashion. Thirdly, using our proposed concepts, we conduct an empirical investigation with different rank-learning algorithms, some of which have not been used so far in CQA domain. On three standard CQA datasets, our proposed framework achieves state-of-the-art performance. We also analyze importance of the features we use in our investigation. This work is expected to guide the practitioners to select a better set of features for the CQA retrieval task.  ( 2 min )
    Using Unsupervised and Supervised Learning and Digital Twin for Deep Convective Ice Storm Classification. (arXiv:2309.07173v1 [cs.LG])
    Smart Ice Cloud Sensing (SMICES) is a small-sat concept in which a primary radar intelligently targets ice storms based on information collected by a lookahead radiometer. Critical to the intelligent targeting is accurate identification of storm/cloud types from eight bands of radiance collected by the radiometer. The cloud types of interest are: clear sky, thin cirrus, cirrus, rainy anvil, and convection core. We describe multi-step use of Machine Learning and Digital Twin of the Earth's atmosphere to derive such a classifier. First, a digital twin of Earth's atmosphere called a Weather Research Forecast (WRF) is used generate simulated lookahead radiometer data as well as deeper "science" hidden variables. The datasets simulate a tropical region over the Caribbean and a non-tropical region over the Atlantic coast of the United States. A K-means clustering over the scientific hidden variables was utilized by human experts to generate an automatic labelling of the data - mapping each physical data point to cloud types by scientists informed by mean/centroids of hidden variables of the clusters. Next, classifiers were trained with the inputs of the simulated radiometer data and its corresponding label. The classifiers of a random decision forest (RDF), support vector machine (SVM), Gaussian na\"ive bayes, feed forward artificial neural network (ANN), and a convolutional neural network (CNN) were trained. Over the tropical dataset, the best performing classifier was able to identify non-storm and storm clouds with over 80% accuracy in each class for a held-out test set. Over the non-tropical dataset, the best performing classifier was able to classify non-storm clouds with over 90% accuracy and storm clouds with over 40% accuracy. Additionally both sets of classifiers were shown to be resilient to instrument noise.  ( 3 min )
    Goal Space Abstraction in Hierarchical Reinforcement Learning via Reachability Analysis. (arXiv:2309.07168v1 [cs.LG])
    Open-ended learning benefits immensely from the use of symbolic methods for goal representation as they offer ways to structure knowledge for efficient and transferable learning. However, the existing Hierarchical Reinforcement Learning (HRL) approaches relying on symbolic reasoning are often limited as they require a manual goal representation. The challenge in autonomously discovering a symbolic goal representation is that it must preserve critical information, such as the environment dynamics. In this work, we propose a developmental mechanism for subgoal discovery via an emergent representation that abstracts (i.e., groups together) sets of environment states that have similar roles in the task. We create a HRL algorithm that gradually learns this representation along with the policies and evaluate it on navigation tasks to show the learned representation is interpretable and results in data efficiency.  ( 2 min )
    CloudBrain-NMR: An Intelligent Cloud Computing Platform for NMR Spectroscopy Processing, Reconstruction and Analysis. (arXiv:2309.07178v1 [q-bio.QM])
    Nuclear Magnetic Resonance (NMR) spectroscopy has served as a powerful analytical tool for studying molecular structure and dynamics in chemistry and biology. However, the processing of raw data acquired from NMR spectrometers and subsequent quantitative analysis involves various specialized tools, which necessitates comprehensive knowledge in programming and NMR. Particularly, the emerging deep learning tools is hard to be widely used in NMR due to the sophisticated setup of computation. Thus, NMR processing is not an easy task for chemist and biologists. In this work, we present CloudBrain-NMR, an intelligent online cloud computing platform designed for NMR data reading, processing, reconstruction, and quantitative analysis. The platform is conveniently accessed through a web browser, eliminating the need for any program installation on the user side. CloudBrain-NMR uses parallel computing with graphics processing units and central processing units, resulting in significantly shortened computation time. Furthermore, it incorporates state-of-the-art deep learning-based algorithms offering comprehensive functionalities that allow users to complete the entire processing procedure without relying on additional software. This platform has empowered NMR applications with advanced artificial intelligence processing. CloudBrain-NMR is openly accessible for free usage at https://csrc.xmu.edu.cn/CloudBrain.html  ( 2 min )
    Market-GAN: Adding Control to Financial Market Data Generation with Semantic Context. (arXiv:2309.07708v1 [cs.LG])
    Financial simulators play an important role in enhancing forecasting accuracy, managing risks, and fostering strategic financial decision-making. Despite the development of financial market simulation methodologies, existing frameworks often struggle with adapting to specialized simulation context. We pinpoint the challenges as i) current financial datasets do not contain context labels; ii) current techniques are not designed to generate financial data with context as control, which demands greater precision compared to other modalities; iii) the inherent difficulties in generating context-aligned, high-fidelity data given the non-stationary, noisy nature of financial data. To address these challenges, our contributions are: i) we proposed the Contextual Market Dataset with market dynamics, stock ticker, and history state as context, leveraging a market dynamics modeling method that combines linear regression and Dynamic Time Warping clustering to extract market dynamics; ii) we present Market-GAN, a novel architecture incorporating a Generative Adversarial Networks (GAN) for the controllable generation with context, an autoencoder for learning low-dimension features, and supervisors for knowledge transfer; iii) we introduce a two-stage training scheme to ensure that Market-GAN captures the intrinsic market distribution with multiple objectives. In the pertaining stage, with the use of the autoencoder and supervisors, we prepare the generator with a better initialization for the adversarial training stage. We propose a set of holistic evaluation metrics that consider alignment, fidelity, data usability on downstream tasks, and market facts. We evaluate Market-GAN with the Dow Jones Industrial Average data from 2000 to 2023 and showcase superior performance in comparison to 4 state-of-the-art time-series generative models.  ( 3 min )
    Traveling Words: A Geometric Interpretation of Transformers. (arXiv:2309.07315v1 [cs.CL])
    Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere.  ( 2 min )
  • Open

    Random Feature Amplification: Feature Learning and Generalization in Neural Networks. (arXiv:2202.07626v4 [cs.LG] UPDATED)
    In this work, we provide a characterization of the feature-learning process in two-layer ReLU networks trained by gradient descent on the logistic loss following random initialization. We consider data with binary labels that are generated by an XOR-like function of the input features. We permit a constant fraction of the training labels to be corrupted by an adversary. We show that, although linear classifiers are no better than random guessing for the distribution we consider, two-layer ReLU networks trained by gradient descent achieve generalization error close to the label noise rate. We develop a novel proof technique that shows that at initialization, the vast majority of neurons function as random features that are only weakly correlated with useful features, and the gradient descent dynamics 'amplify' these weak, random features to strong, useful features.
    Learning nonparametric DAGs with incremental information via high-order HSIC. (arXiv:2308.05969v2 [cs.LG] UPDATED)
    Score-based methods for learning Bayesain networks(BN) aim to maximizing the global score functions. However, if local variables have direct and indirect dependence simultaneously, the global optimization on score functions misses edges between variables with indirect dependent relationship, of which scores are smaller than those with direct dependent relationship. In this paper, we present an identifiability condition based on a determined subset of parents to identify the underlying DAG. By the identifiability condition, we develop a two-phase algorithm namely optimal-tuning (OT) algorithm to locally amend the global optimization. In the optimal phase, an optimization problem based on first-order Hilbert-Schmidt independence criterion (HSIC) gives an estimated skeleton as the initial determined parents subset. In the tuning phase, the skeleton is locally tuned by deletion, addition and DAG-formalization strategies using the theoretically proved incremental properties of high-order HSIC. Numerical experiments for different synthetic datasets and real-world datasets show that the OT algorithm outperforms existing methods. Especially in Sigmoid Mix model with the size of the graph being ${\rm\bf d=40}$, the structure intervention distance (SID) of the OT algorithm is 329.7 smaller than the one obtained by CAM, which indicates that the graph estimated by the OT algorithm misses fewer edges compared with CAM.Source code of the OT algorithm is available at https://github.com/YafeiannWang/optimal-tune-algorithm.
    SC-MAD: Mixtures of Higher-order Networks for Data Augmentation. (arXiv:2309.07453v1 [stat.ML])
    The myriad complex systems with multiway interactions motivate the extension of graph-based pairwise connections to higher-order relations. In particular, the simplicial complex has inspired generalizations of graph neural networks (GNNs) to simplicial complex-based models. Learning on such systems requires large amounts of data, which can be expensive or impossible to obtain. We propose data augmentation of simplicial complexes through both linear and nonlinear mixup mechanisms that return mixtures of existing labeled samples. In addition to traditional pairwise mixup, we present a convex clustering mixup approach for a data-driven relationship among several simplicial complexes. We theoretically demonstrate that the resultant synthetic simplicial complexes interpolate among existing data with respect to homomorphism densities. Our method is demonstrated on both synthetic and real-world datasets for simplicial complex classification.
    Multi-Source Domain Adaptation meets Dataset Distillation through Dataset Dictionary Learning. (arXiv:2309.07666v1 [cs.LG])
    In this paper, we consider the intersection of two problems in machine learning: Multi-Source Domain Adaptation (MSDA) and Dataset Distillation (DD). On the one hand, the first considers adapting multiple heterogeneous labeled source domains to an unlabeled target domain. On the other hand, the second attacks the problem of synthesizing a small summary containing all the information about the datasets. We thus consider a new problem called MSDA-DD. To solve it, we adapt previous works in the MSDA literature, such as Wasserstein Barycenter Transport and Dataset Dictionary Learning, as well as DD method Distribution Matching. We thoroughly experiment with this novel problem on four benchmarks (Caltech-Office 10, Tennessee-Eastman Process, Continuous Stirred Tank Reactor, and Case Western Reserve University), where we show that, even with as little as 1 sample per class, one achieves state-of-the-art adaptation performance.
    Gaussian Process Surrogate Models for Neural Networks. (arXiv:2208.06028v2 [cs.LG] UPDATED)
    Not being able to understand and predict the behavior of deep learning systems makes it hard to decide what architecture and algorithm to use for a given problem. In science and engineering, modeling is a methodology used to understand complex systems whose internal processes are opaque. Modeling replaces a complex system with a simpler, more interpretable surrogate. Drawing inspiration from this, we construct a class of surrogate models for neural networks using Gaussian processes. Rather than deriving kernels for infinite neural networks, we learn kernels empirically from the naturalistic behavior of finite neural networks. We demonstrate our approach captures existing phenomena related to the spectral bias of neural networks, and then show that our surrogate models can be used to solve practical problems such as identifying which points most influence the behavior of specific neural networks and predicting which architectures and algorithms will generalize well for specific datasets.
    Dataset Size Dependence of Rate-Distortion Curve and Threshold of Posterior Collapse in Linear VAE. (arXiv:2309.07663v1 [stat.ML])
    In the Variational Autoencoder (VAE), the variational posterior often aligns closely with the prior, which is known as posterior collapse and hinders the quality of representation learning. To mitigate this problem, an adjustable hyperparameter beta has been introduced in the VAE. This paper presents a closed-form expression to assess the relationship between the beta in VAE, the dataset size, the posterior collapse, and the rate-distortion curve by analyzing a minimal VAE in a high-dimensional limit. These results clarify that a long plateau in the generalization error emerges with a relatively larger beta. As the beta increases, the length of the plateau extends and then becomes infinite beyond a certain beta threshold. This implies that the choice of beta, unlike the usual regularization parameters, can induce posterior collapse regardless of the dataset size. Thus, beta is a risky parameter that requires careful tuning. Furthermore, considering the dataset-size dependence on the rate-distortion curve, a relatively large dataset is required to obtain a rate-distortion curve with high rates. Extensive numerical experiments support our analysis.
    Trajectory-oriented optimization of stochastic epidemiological models. (arXiv:2305.03926v3 [stat.AP] UPDATED)
    Epidemiological models must be calibrated to ground truth for downstream tasks such as producing forward projections or running what-if scenarios. The meaning of calibration changes in case of a stochastic model since output from such a model is generally described via an ensemble or a distribution. Each member of the ensemble is usually mapped to a random number seed (explicitly or implicitly). With the goal of finding not only the input parameter settings but also the random seeds that are consistent with the ground truth, we propose a class of Gaussian process (GP) surrogates along with an optimization strategy based on Thompson sampling. This Trajectory Oriented Optimization (TOO) approach produces actual trajectories close to the empirical observations instead of a set of parameter settings where only the mean simulation behavior matches with the ground truth.
    Nowhere coexpanding functions. (arXiv:2303.12814v3 [stat.ML] UPDATED)
    We define a family of $C^1$ functions which we call "nowhere coexpanding functions" that is closed under composition and includes all $C^3$ functions with non-positive Schwarzian derivative. We establish results on the number and nature of the fixed points of these functions, including a generalisation of a classic result of Singer.
    Simultaneous inference for generalized linear models with unmeasured confounders. (arXiv:2309.07261v1 [stat.ME])
    Tens of thousands of simultaneous hypothesis tests are routinely performed in genomic studies to identify differentially expressed genes. However, due to unmeasured confounders, many standard statistical approaches may be substantially biased. This paper investigates the large-scale hypothesis testing problem for multivariate generalized linear models in the presence of confounding effects. Under arbitrary confounding mechanisms, we propose a unified statistical estimation and inference framework that harnesses orthogonal structures and integrates linear projections into three key stages. It first leverages multivariate responses to separate marginal and uncorrelated confounding effects, recovering the confounding coefficients' column space. Subsequently, latent factors and primary effects are jointly estimated, utilizing $\ell_1$-regularization for sparsity while imposing orthogonality onto confounding coefficients. Finally, we incorporate projected and weighted bias-correction steps for hypothesis testing. Theoretically, we establish various effects' identification conditions and non-asymptotic error bounds. We show effective Type-I error control of asymptotic $z$-tests as sample and response sizes approach infinity. Numerical experiments demonstrate that the proposed method controls the false discovery rate by the Benjamini-Hochberg procedure and is more powerful than alternative methods. By comparing single-cell RNA-seq counts from two groups of samples, we demonstrate the suitability of adjusting confounding effects when significant covariates are absent from the model.
    Speeding up Learning Quantum States through Group Equivariant Convolutional Quantum Ans\"atze. (arXiv:2112.07611v3 [quant-ph] UPDATED)
    We develop a theoretical framework for $S_n$-equivariant convolutional quantum circuits with SU$(d)$-symmetry, building on and significantly generalizing Jordan's Permutational Quantum Computing (PQC) formalism based on Schur-Weyl duality connecting both SU$(d)$ and $S_n$ actions on qudits. In particular, we utilize the Okounkov-Vershik approach to prove Harrow's statement (Ph.D. Thesis 2005 p.160) on the equivalence between $\operatorname{SU}(d)$ and $S_n$ irrep bases and to establish the $S_n$-equivariant Convolutional Quantum Alternating Ans\"atze ($S_n$-CQA) using Young-Jucys-Murphy (YJM) elements. We prove that $S_n$-CQA is able to generate any unitary in any given $S_n$ irrep sector, which may serve as a universal model for a wide array of quantum machine learning problems with the presence of SU($d$) symmetry. Our method provides another way to prove the universality of Quantum Approximate Optimization Algorithm (QAOA) and verifies that 4-local SU($d$) symmetric unitaries are sufficient to build generic SU($d$) symmetric quantum circuits up to relative phase factors. We present numerical simulations to showcase the effectiveness of the ans\"atze to find the ground state energy of the $J_1$--$J_2$ antiferromagnetic Heisenberg model on the rectangular and Kagome lattices. Our work provides the first application of the celebrated Okounkov-Vershik's $S_n$ representation theory to quantum physics and machine learning, from which to propose quantum variational ans\"atze that strongly suggests to be classically intractable tailored towards a specific optimization problem.
    Reliability-based cleaning of noisy training labels with inductive conformal prediction in multi-modal biomedical data mining. (arXiv:2309.07332v1 [cs.LG])
    Accurately labeling biomedical data presents a challenge. Traditional semi-supervised learning methods often under-utilize available unlabeled data. To address this, we propose a novel reliability-based training data cleaning method employing inductive conformal prediction (ICP). This method capitalizes on a small set of accurately labeled training data and leverages ICP-calculated reliability metrics to rectify mislabeled data and outliers within vast quantities of noisy training data. The efficacy of the method is validated across three classification tasks within distinct modalities: filtering drug-induced-liver-injury (DILI) literature with title and abstract, predicting ICU admission of COVID-19 patients through CT radiomics and electronic health records, and subtyping breast cancer using RNA-sequencing data. Varying levels of noise to the training labels were introduced through label permutation. Results show significant enhancements in classification performance: accuracy enhancement in 86 out of 96 DILI experiments (up to 11.4%), AUROC and AUPRC enhancements in all 48 COVID-19 experiments (up to 23.8% and 69.8%), and accuracy and macro-average F1 score improvements in 47 out of 48 RNA-sequencing experiments (up to 74.6% and 89.0%). Our method offers the potential to substantially boost classification performance in multi-modal biomedical machine learning tasks. Importantly, it accomplishes this without necessitating an excessive volume of meticulously curated training data.  ( 3 min )
    Choosing a Proxy Metric from Past Experiments. (arXiv:2309.07893v1 [stat.ME])
    In many randomized experiments, the treatment effect of the long-term metric (i.e. the primary outcome of interest) is often difficult or infeasible to measure. Such long-term metrics are often slow to react to changes and sufficiently noisy they are challenging to faithfully estimate in short-horizon experiments. A common alternative is to measure several short-term proxy metrics in the hope they closely track the long-term metric -- so they can be used to effectively guide decision-making in the near-term. We introduce a new statistical framework to both define and construct an optimal proxy metric for use in a homogeneous population of randomized experiments. Our procedure first reduces the construction of an optimal proxy metric in a given experiment to a portfolio optimization problem which depends on the true latent treatment effects and noise level of experiment under consideration. We then denoise the observed treatment effects of the long-term metric and a set of proxies in a historical corpus of randomized experiments to extract estimates of the latent treatment effects for use in the optimization problem. One key insight derived from our approach is that the optimal proxy metric for a given experiment is not apriori fixed; rather it should depend on the sample size (or effective noise level) of the randomized experiment for which it is deployed. To instantiate and evaluate our framework, we employ our methodology in a large corpus of randomized experiments from an industrial recommendation system and construct proxy metrics that perform favorably relative to several baselines.  ( 2 min )
    Joint Community Detection and Rotational Synchronization via Semidefinite Programming. (arXiv:2105.06031v2 [stat.ML] UPDATED)
    In the presence of heterogeneous data, where randomly rotated objects fall into multiple underlying categories, it is challenging to simultaneously classify them into clusters and synchronize them based on pairwise relations. This gives rise to the joint problem of community detection and synchronization. We propose a series of semidefinite relaxations, and prove their exact recovery when extending the celebrated stochastic block model to this new setting where both rotations and cluster identities are to be determined. Numerical experiments demonstrate the efficacy of our proposed algorithms and confirm our theoretical result which indicates a sharp phase transition for exact recovery.  ( 2 min )
    Beta Diffusion. (arXiv:2309.07867v1 [cs.LG])
    We introduce beta diffusion, a novel generative modeling method that integrates demasking and denoising to generate data within bounded ranges. Using scaled and shifted beta distributions, beta diffusion utilizes multiplicative transitions over time to create both forward and reverse diffusion processes, maintaining beta distributions in both the forward marginals and the reverse conditionals, given the data at any point in time. Unlike traditional diffusion-based generative models relying on additive Gaussian noise and reweighted evidence lower bounds (ELBOs), beta diffusion is multiplicative and optimized with KL-divergence upper bounds (KLUBs) derived from the convexity of the KL divergence. We demonstrate that the proposed KLUBs are more effective for optimizing beta diffusion compared to negative ELBOs, which can also be derived as the KLUBs of the same KL divergence with its two arguments swapped. The loss function of beta diffusion, expressed in terms of Bregman divergence, further supports the efficacy of KLUBs for optimization. Experimental results on both synthetic data and natural images demonstrate the unique capabilities of beta diffusion in generative modeling of range-bounded data and validate the effectiveness of KLUBs in optimizing diffusion models, thereby making them valuable additions to the family of diffusion-based generative models and the optimization techniques used to train them.  ( 2 min )
    A Fast Optimization View: Reformulating Single Layer Attention in LLM Based on Tensor and SVM Trick, and Solving It in Matrix Multiplication Time. (arXiv:2309.07418v1 [cs.DS])
    Large language models (LLMs) have played a pivotal role in revolutionizing various facets of our daily existence. Solving attention regression is a fundamental task in optimizing LLMs. In this work, we focus on giving a provable guarantee for the one-layer attention network objective function $L(X,Y) = \sum_{j_0 = 1}^n \sum_{i_0 = 1}^d ( \langle \langle \exp( \mathsf{A}_{j_0} x ) , {\bf 1}_n \rangle^{-1} \exp( \mathsf{A}_{j_0} x ), A_{3} Y_{*,i_0} \rangle - b_{j_0,i_0} )^2$. Here $\mathsf{A} \in \mathbb{R}^{n^2 \times d^2}$ is Kronecker product between $A_1 \in \mathbb{R}^{n \times d}$ and $A_2 \in \mathbb{R}^{n \times d}$. $A_3$ is a matrix in $\mathbb{R}^{n \times d}$, $\mathsf{A}_{j_0} \in \mathbb{R}^{n \times d^2}$ is the $j_0$-th block of $\mathsf{A}$. The $X, Y \in \mathbb{R}^{d \times d}$ are variables we want to learn. $B \in \mathbb{R}^{n \times d}$ and $b_{j_0,i_0} \in \mathbb{R}$ is one entry at $j_0$-th row and $i_0$-th column of $B$, $Y_{*,i_0} \in \mathbb{R}^d$ is the $i_0$-column vector of $Y$, and $x \in \mathbb{R}^{d^2}$ is the vectorization of $X$. In a multi-layer LLM network, the matrix $B \in \mathbb{R}^{n \times d}$ can be viewed as the output of a layer, and $A_1= A_2 = A_3 \in \mathbb{R}^{n \times d}$ can be viewed as the input of a layer. The matrix version of $x$ can be viewed as $QK^\top$ and $Y$ can be viewed as $V$. We provide an iterative greedy algorithm to train loss function $L(X,Y)$ up $\epsilon$ that runs in $\widetilde{O}( ({\cal T}_{\mathrm{mat}}(n,n,d) + {\cal T}_{\mathrm{mat}}(n,d,d) + d^{2\omega}) \log(1/\epsilon) )$ time. Here ${\cal T}_{\mathrm{mat}}(a,b,c)$ denotes the time of multiplying $a \times b$ matrix another $b \times c$ matrix, and $\omega\approx 2.37$ denotes the exponent of matrix multiplication.  ( 3 min )
    Kernel Conditional Moment Constraints for Confounding Robust Inference. (arXiv:2302.13348v2 [stat.ML] UPDATED)
    We study policy evaluation of offline contextual bandits subject to unobserved confounders. Sensitivity analysis methods are commonly used to estimate the policy value under the worst-case confounding over a given uncertainty set. However, existing work often resorts to some coarse relaxation of the uncertainty set for the sake of tractability, leading to overly conservative estimation of the policy value. In this paper, we propose a general estimator that provides a sharp lower bound of the policy value. It can be shown that our estimator contains the recently proposed sharp estimator by Dorn and Guo (2022) as a special case, and our method enables a novel extension of the classical marginal sensitivity model using f-divergence. To construct our estimator, we leverage the kernel method to obtain a tractable approximation to the conditional moment constraints, which traditional non-sharp estimators failed to take into account. In the theoretical analysis, we provide a condition for the choice of the kernel which guarantees no specification error that biases the lower bound estimation. Furthermore, we provide consistency guarantees of policy evaluation and learning. In the experiments with synthetic and real-world data, we demonstrate the effectiveness of the proposed method.  ( 2 min )
    All you need is spin: SU(2) equivariant variational quantum circuits based on spin networks. (arXiv:2309.07250v1 [quant-ph])
    Variational algorithms require architectures that naturally constrain the optimisation space to run efficiently. In geometric quantum machine learning, one achieves this by encoding group structure into parameterised quantum circuits to include the symmetries of a problem as an inductive bias. However, constructing such circuits is challenging as a concrete guiding principle has yet to emerge. In this paper, we propose the use of spin networks, a form of directed tensor network invariant under a group transformation, to devise SU(2) equivariant quantum circuit ans\"atze -- circuits possessing spin rotation symmetry. By changing to the basis that block diagonalises SU(2) group action, these networks provide a natural building block for constructing parameterised equivariant quantum circuits. We prove that our construction is mathematically equivalent to other known constructions, such as those based on twirling and generalised permutations, but more direct to implement on quantum hardware. The efficacy of our constructed circuits is tested by solving the ground state problem of SU(2) symmetric Heisenberg models on the one-dimensional triangular lattice and on the Kagome lattice. Our results highlight that our equivariant circuits boost the performance of quantum variational algorithms, indicating broader applicability to other real-world problems.  ( 2 min )
    Deep Nonparametric Convexified Filtering for Computational Photography, Image Synthesis and Adversarial Defense. (arXiv:2309.06724v2 [cs.CV] UPDATED)
    We aim to provide a general framework of for computational photography that recovers the real scene from imperfect images, via the Deep Nonparametric Convexified Filtering (DNCF). It is consists of a nonparametric deep network to resemble the physical equations behind the image formation, such as denoising, super-resolution, inpainting, and flash. DNCF has no parameterization dependent on training data, therefore has a strong generalization and robustness to adversarial image manipulation. During inference, we also encourage the network parameters to be nonnegative and create a bi-convex function on the input and parameters, and this adapts to second-order optimization algorithms with insufficient running time, having 10X acceleration over Deep Image Prior. With these tools, we empirically verify its capability to defend image classification deep networks against adversary attack algorithms in real-time.  ( 2 min )
    On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems. (arXiv:1906.00331v9 [cs.LG] UPDATED)
    We consider nonconvex-concave minimax problems, $\min_{\mathbf{x}} \max_{\mathbf{y} \in \mathcal{Y}} f(\mathbf{x}, \mathbf{y})$, where $f$ is nonconvex in $\mathbf{x}$ but concave in $\mathbf{y}$ and $\mathcal{Y}$ is a convex and bounded set. One of the most popular algorithms for solving this problem is the celebrated gradient descent ascent (GDA) algorithm, which has been widely used in machine learning, control theory and economics. Despite the extensive convergence results for the convex-concave setting, GDA with equal stepsize can converge to limit cycles or even diverge in a general setting. In this paper, we present the complexity results on two-time-scale GDA for solving nonconvex-concave minimax problems, showing that the algorithm can find a stationary point of the function $\Phi(\cdot) := \max_{\mathbf{y} \in \mathcal{Y}} f(\cdot, \mathbf{y})$ efficiently. To the best our knowledge, this is the first nonasymptotic analysis for two-time-scale GDA in this setting, shedding light on its superior practical performance in training generative adversarial networks (GANs) and other real applications.  ( 3 min )
    Spectrum-Aware Adjustment: A New Debiasing Framework with Applications to Principal Components Regression. (arXiv:2309.07810v1 [math.ST])
    We introduce a new debiasing framework for high-dimensional linear regression that bypasses the restrictions on covariate distributions imposed by modern debiasing technology. We study the prevalent setting where the number of features and samples are both large and comparable. In this context, state-of-the-art debiasing technology uses a degrees-of-freedom correction to remove shrinkage bias of regularized estimators and conduct inference. However, this method requires that the observed samples are i.i.d., the covariates follow a mean zero Gaussian distribution, and reliable covariance matrix estimates for observed features are available. This approach struggles when (i) covariates are non-Gaussian with heavy tails or asymmetric distributions, (ii) rows of the design exhibit heterogeneity or dependencies, and (iii) reliable feature covariance estimates are lacking. To address these, we develop a new strategy where the debiasing correction is a rescaled gradient descent step (suitably initialized) with step size determined by the spectrum of the sample covariance matrix. Unlike prior work, we assume that eigenvectors of this matrix are uniform draws from the orthogonal group. We show this assumption remains valid in diverse situations where traditional debiasing fails, including designs with complex row-column dependencies, heavy tails, asymmetric properties, and latent low-rank structures. We establish asymptotic normality of our proposed estimator (centered and scaled) under various convergence notions. Moreover, we develop a consistent estimator for its asymptotic variance. Lastly, we introduce a debiased Principal Component Regression (PCR) technique using our Spectrum-Aware approach. In varied simulations and real data experiments, we observe that our method outperforms degrees-of-freedom debiasing by a margin.  ( 3 min )
    Survival Estimation for Missing not at Random Censoring Indicators based on Copula Models. (arXiv:2009.01726v2 [stat.ML] UPDATED)
    In the presence of right-censored data with covariates, the conditional Kaplan-Meier estimator (also known as the Beran estimator) consistently estimates the conditional survival function of the random follow-up for the event of interest. However, a necessary condition is the unambiguous knowledge of whether each individual is censored or not, which may be incomplete in practice. We therefore propose a study of the Beran estimator when the censoring indicators are generic random variables and discuss necessary conditions for the efficiency of the Beran estimator. From this, we provide a new estimator for the conditional survival function with missing not at random (MNAR) censoring indicators based on a conditional copula model for the missingness mechanism. In addition to the theoretical results, we illustrate how the estimators work for small samples through a simulation study and show their practical applicability by analyzing synthetic and real data.  ( 2 min )
    Convergence analysis of online algorithms for vector-valued kernel regression. (arXiv:2309.07779v1 [stat.ML])
    We consider the problem of approximating the regression function from noisy vector-valued data by an online learning algorithm using an appropriate reproducing kernel Hilbert space (RKHS) as prior. In an online algorithm, i.i.d. samples become available one by one by a random process and are successively processed to build approximations to the regression function. We are interested in the asymptotic performance of such online approximation algorithms and show that the expected squared error in the RKHS norm can be bounded by $C^2 (m+1)^{-s/(2+s)}$, where $m$ is the current number of processed data, the parameter $0<s\leq 1$ expresses an additional smoothness assumption on the regression function and the constant $C$ depends on the variance of the input noise, the smoothness of the regression function and further parameters of the algorithm.  ( 2 min )
    Scalable Bayesian optimization with high-dimensional outputs using randomized prior networks. (arXiv:2302.07260v5 [cs.LG] UPDATED)
    Several fundamental problems in science and engineering consist of global optimization tasks involving unknown high-dimensional (black-box) functions that map a set of controllable variables to the outcomes of an expensive experiment. Bayesian Optimization (BO) techniques are known to be effective in tackling global optimization problems using a relatively small number objective function evaluations, but their performance suffers when dealing with high-dimensional outputs. To overcome the major challenge of dimensionality, here we propose a deep learning framework for BO and sequential decision making based on bootstrapped ensembles of neural architectures with randomized priors. Using appropriate architecture choices, we show that the proposed framework can approximate functional relationships between design variables and quantities of interest, even in cases where the latter take values in high-dimensional vector spaces or even infinite-dimensional function spaces. In the context of BO, we augmented the proposed probabilistic surrogates with re-parameterized Monte Carlo approximations of multiple-point (parallel) acquisition functions, as well as methodological extensions for accommodating black-box constraints and multi-fidelity information sources. We test the proposed framework against state-of-the-art methods for BO and demonstrate superior performance across several challenging tasks with high-dimensional outputs, including a constrained multi-fidelity optimization task involving shape optimization of rotor blades in turbo-machinery.  ( 3 min )
    Benign Overfitting without Linearity: Neural Network Classifiers Trained by Gradient Descent for Noisy Linear Data. (arXiv:2202.05928v4 [cs.LG] UPDATED)
    Benign overfitting, the phenomenon where interpolating models generalize well in the presence of noisy data, was first observed in neural network models trained with gradient descent. To better understand this empirical observation, we consider the generalization error of two-layer neural networks trained to interpolation by gradient descent on the logistic loss following random initialization. We assume the data comes from well-separated class-conditional log-concave distributions and allow for a constant fraction of the training labels to be corrupted by an adversary. We show that in this setting, neural networks exhibit benign overfitting: they can be driven to zero training error, perfectly fitting any noisy training labels, and simultaneously achieve minimax optimal test error. In contrast to previous work on benign overfitting that require linear or kernel-based predictors, our analysis holds in a setting where both the model and learning dynamics are fundamentally nonlinear.  ( 2 min )
    Statistically Valid Variable Importance Assessment through Conditional Permutations. (arXiv:2309.07593v1 [cs.LG])
    Variable importance assessment has become a crucial step in machine-learning applications when using complex learners, such as deep neural networks, on large-scale data. Removal-based importance assessment is currently the reference approach, particularly when statistical guarantees are sought to justify variable inclusion. It is often implemented with variable permutation schemes. On the flip side, these approaches risk misidentifying unimportant variables as important in the presence of correlations among covariates. Here we develop a systematic approach for studying Conditional Permutation Importance (CPI) that is model agnostic and computationally lean, as well as reusable benchmarks of state-of-the-art variable importance estimators. We show theoretically and empirically that $\textit{CPI}$ overcomes the limitations of standard permutation importance by providing accurate type-I error control. When used with a deep neural network, $\textit{CPI}$ consistently showed top accuracy across benchmarks. An empirical benchmark on real-world data analysis in a large-scale medical dataset showed that $\textit{CPI}$ provides a more parsimonious selection of statistically significant variables. Our results suggest that $\textit{CPI}$ can be readily used as drop-in replacement for permutation-based methods.  ( 2 min )
    Physics-informed Bayesian inference of external potentials in classical density-functional theory. (arXiv:2309.07065v2 [cond-mat.stat-mech] UPDATED)
    The swift progression of machine learning (ML) has not gone unnoticed in the realm of statistical mechanics. ML techniques have attracted attention by the classical density-functional theory (DFT) community, as they enable discovery of free-energy functionals to determine the equilibrium-density profile of a many-particle system. Within DFT, the external potential accounts for the interaction of the many-particle system with an external field, thus, affecting the density distribution. In this context, we introduce a statistical-learning framework to infer the external potential exerted on a many-particle system. We combine a Bayesian inference approach with the classical DFT apparatus to reconstruct the external potential, yielding a probabilistic description of the external potential functional form with inherent uncertainty quantification. Our framework is exemplified with a grand-canonical one-dimensional particle ensemble with excluded volume interactions in a confined geometry. The required training dataset is generated using a Monte Carlo (MC) simulation where the external potential is applied to the grand-canonical ensemble. The resulting particle coordinates from the MC simulation are fed into the learning framework to uncover the external potential. This eventually allows us to compute the equilibrium density profile of the system by using the tools of DFT. Our approach benchmarks the inferred density against the exact one calculated through the DFT formulation with the true external potential. The proposed Bayesian procedure accurately infers the external potential and the density profile. We also highlight the external-potential uncertainty quantification conditioned on the amount of available simulated data. The seemingly simple case study introduced in this work might serve as a prototype for studying a wide variety of applications, including adsorption and capillarity.  ( 3 min )
    Causal Entropy and Information Gain for Measuring Causal Control. (arXiv:2309.07703v1 [cs.LG])
    Artificial intelligence models and methods commonly lack causal interpretability. Despite the advancements in interpretable machine learning (IML) methods, they frequently assign importance to features which lack causal influence on the outcome variable. Selecting causally relevant features among those identified as relevant by these methods, or even before model training, would offer a solution. Feature selection methods utilizing information theoretical quantities have been successful in identifying statistically relevant features. However, the information theoretical quantities they are based on do not incorporate causality, rendering them unsuitable for such scenarios. To address this challenge, this article proposes information theoretical quantities that incorporate the causal structure of the system, which can be used to evaluate causal importance of features for some given outcome variable. Specifically, we introduce causal versions of entropy and mutual information, termed causal entropy and causal information gain, which are designed to assess how much control a feature provides over the outcome variable. These newly defined quantities capture changes in the entropy of a variable resulting from interventions on other variables. Fundamental results connecting these quantities to the existence of causal effects are derived. The use of causal information gain in feature selection is demonstrated, highlighting its superiority over standard mutual information in revealing which features provide control over a chosen outcome variable. Our investigation paves the way for the development of methods with improved interpretability in domains involving causation.  ( 3 min )
    Scalable Model-Based Gaussian Process Clustering. (arXiv:2309.07882v1 [stat.CO])
    Gaussian process is an indispensable tool in clustering functional data, owing to it's flexibility and inherent uncertainty quantification. However, when the functional data is observed over a large grid (say, of length $p$), Gaussian process clustering quickly renders itself infeasible, incurring $O(p^2)$ space complexity and $O(p^3)$ time complexity per iteration; and thus prohibiting it's natural adaptation to large environmental applications. To ensure scalability of Gaussian process clustering in such applications, we propose to embed the popular Vecchia approximation for Gaussian processes at the heart of the clustering task, provide crucial theoretical insights towards algorithmic design, and finally develop a computationally efficient expectation maximization (EM) algorithm. Empirical evidence of the utility of our proposal is provided via simulations and analysis of polar temperature anomaly (\href{https://www.ncei.noaa.gov/access/monitoring/climate-at-a-glance/global/time-series}{noaa.gov}) data-sets.  ( 2 min )
    Optimal and Fair Encouragement Policy Evaluation and Learning. (arXiv:2309.07176v1 [cs.LG])
    In consequential domains, it is often impossible to compel individuals to take treatment, so that optimal policy rules are merely suggestions in the presence of human non-adherence to treatment recommendations. In these same domains, there may be heterogeneity both in who responds in taking-up treatment, and heterogeneity in treatment efficacy. While optimal treatment rules can maximize causal outcomes across the population, access parity constraints or other fairness considerations can be relevant in the case of encouragement. For example, in social services, a persistent puzzle is the gap in take-up of beneficial services among those who may benefit from them the most. When in addition the decision-maker has distributional preferences over both access and average outcomes, the optimal decision rule changes. We study causal identification, statistical variance-reduced estimation, and robust estimation of optimal treatment rules, including under potential violations of positivity. We consider fairness constraints such as demographic parity in treatment take-up, and other constraints, via constrained optimization. Our framework can be extended to handle algorithmic recommendations under an often-reasonable covariate-conditional exclusion restriction, using our robustness checks for lack of positivity in the recommendation. We develop a two-stage algorithm for solving over parametrized policy classes under general constraints to obtain variance-sensitive regret bounds. We illustrate the methods in two case studies based on data from randomized encouragement to enroll in insurance and from pretrial supervised release with electronic monitoring.  ( 2 min )
    On a continuous time model of gradient descent dynamics and instability in deep learning. (arXiv:2302.01952v3 [stat.ML] UPDATED)
    The recipe behind the success of deep learning has been the combination of neural networks and gradient-based optimization. Understanding the behavior of gradient descent however, and particularly its instability, has lagged behind its empirical success. To add to the theoretical tools available to study gradient descent we propose the principal flow (PF), a continuous time flow that approximates gradient descent dynamics. To our knowledge, the PF is the only continuous flow that captures the divergent and oscillatory behaviors of gradient descent, including escaping local minima and saddle points. Through its dependence on the eigendecomposition of the Hessian the PF sheds light on the recently observed edge of stability phenomena in deep learning. Using our new understanding of instability we propose a learning rate adaptation method which enables us to control the trade-off between training stability and test set evaluation performance.  ( 2 min )

  • Open

    Google’s new DeepMind Maps algorithm improves route suggestions by 24%
    Google has developed an AI algorithm to refine route suggestions on Google Maps, personalizing it based on user data and behavior, allegedly improving the accuracy on an average by 16-24 percent. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/rp21ylzo7iob1.jpg?width=770&format=pjpg&auto=webp&s=737cf3fd278ade42900e8586d626a7c729522c1a Personalized Route Suggestions through AI The AI model comprises 360 million parameters, using real-time data from Maps users to influence factors including travel time, road conditions, tolls, and personal preferences to suggest routes. This technology is grounded on "inverse reinforcement learning" (IRL), specifically a new IRL algorithm - "Receding Horizon Inverse Planning (RHIP)". The Power of RHIP and AI in Maps Google and Deepmind jointly worked to develop RHIP, using complex stochastic models in immediate vicinity areas, but switching to simpler deterministic methods for distant areas for power conservation. The AI improves route suggestions for both driving and two-wheeled vehicles by learning from Maps users' movements and behaviors over time. Google states that this is the largest application of inverse reinforcement learning for route planning to date. Implementation and User Testing Google has applied the algorithm to Maps data globally, but extensive user testing is needed to confirm if the technique consistently produces better routes. Previous attempts at using AI systems for route planning on a large scale have often failed due to the complexity of road networks. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most relevant news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Bard on Republican 1/6 denial
    There are a number of reasons why some Republicans may be in denial about Trump's role in the January 6th Capitol riot. Trump has been very effective at cultivating a cult of personality among his supporters. These supporters are often fiercely loyal to him and willing to believe whatever he says, even if it contradicts reality. Many Republicans have been reluctant to criticize Trump for fear of alienating his supporters. This is especially true for Republican politicians who are up for re-election in 2022. Some Republicans may genuinely believe that Trump did not incite the riot. They may point to the fact that he did not explicitly call for violence, or that he told his supporters to "peacefully and patriotically make your voices heard." Other Republicans may be motivated by partis…  ( 10 min )
    Bard on factory farming and clean meat
    Factory farming is a cruel and unsustainable way to produce meat. Animals are crammed into small, filthy spaces, where they are often denied their basic needs. They are given antibiotics to prevent disease and growth hormones to make them grow faster. This leads to a variety of health problems for the animals, as well as for the people who eat their meat. Clean meat is a new technology that offers a more humane and sustainable way to produce meat. It is made from animal cells that are grown in a lab, without the need for animals to be raised or slaughtered. Clean meat is still in its early stages of development, but it has the potential to revolutionize the meat industry. Here are some of the reasons why factory farming must be replaced by clean meat: It is cruel to animals. Factory f…  ( 10 min )
    AI — weekly megathread!
    News provided by aibrews.com Stability AI launched Stable Audio, a generative AI tool for music & sound generation from text. The underlying latent diffusion model architecture uses audio conditioned on text metadata as well as audio file duration and start time [Details]. Coqui released XTTS - a new voice generation model that lets you clone voices in 13 different languages by using just a quick 3-second audio clip [Details]. Microsoft Research released and open-sourced Phi-1.5 - a 1.3 billion parameter transformer-based model with performance on natural language tasks comparable to models 5x larger [Paper ]. Project Gutenberg, Microsoft and MIT have worked together to use neural text-to-speech to create and release thousands of human-quality free and open audiobooks [Details]. Res…  ( 11 min )
    With things always moving so fast, I may have missed it. Anyone doing something like Run.AI in an opensource capacity?
    Has anyone dropped an opensource tool like run.ai, to leverage multiple gpus / distribute the workload a bit more efficiently? I'm loving some of the single gpu llm modifications that have been dropping recently (have a couple i've tested that ran well on 4090 and 3090ti in the lab), but i've got a plethora of 8 & 12 gig 3xxx series cards i'd love to take advantage of beyond passthroughs to individual vms. Looking for any solutions. Speed isn't as important as the ability to distributively run larger models. submitted by /u/SwallowedBuckyBalls [link] [comments]  ( 9 min )
    AI Death Metal band 'Anarchy's Echo' - Debut single 'Soul Shredder' [visuals, music, name, logo all AI generated]
    submitted by /u/glenniszen [link] [comments]  ( 8 min )
    free AI summary of article based on most frequently used English words
    I am looking for a free AI tool, site (may be paid with free account limited options) that can do a summary of English article (like BBC, youtube) based on most frequently used English words, for example Oxford list of 3000 most frequently used words in English. In other words if word is not in that list it should replace the word with the one that is on the list or its definition. Pdf submitted by /u/vibrantform [link] [comments]  ( 9 min )
    AI on why we have much to be grateful for
    I said to Bard: It seems like these are difficult times. Compare today with the WWII years, explaining how much we have to be grateful for. Bard said: I agree that these are difficult times. The world is facing many challenges, including the COVID-19 pandemic, climate change, and political instability. However, I believe that we have much to be grateful for today compared to the WWII years. Here are just a few things: We live in a time of peace. World War II was a global conflict that caused the deaths of millions of people. Today, there are no major wars raging on. We have access to advanced medical care. During WWII, many people died from diseases that are now easily treatable. Today, we have antibiotics, vaccines, and other medical advances that have saved countless lives. We have a higher standard of living. In the WWII years, many people lived in poverty. Today, most people in the developed world have access to clean water, food, and shelter. We have more opportunities for education and advancement. In the WWII years, many people were denied an education because of their race, gender, or social class. Today, education is more accessible than ever before, and people from all backgrounds have the opportunity to succeed. We have a better understanding of the world. During WWII, there was a lot of misinformation and propaganda. Today, we have access to more information than ever before, and we can make more informed decisions about our lives. Of course, there are still many challenges that we face today. But I believe that we have come a long way since the WWII years, and we have much to be grateful for. Let us never forget the sacrifices that were made during that time, and let us continue to work towards a better future for all. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    What is at stake in the AI based techno-economic war between the West and China?
    The AI-based techno-economic war between the West and China will determine the global distribution of technology, economic benefits, and influence. The winner of this race, particularly in AI, will have the power to accelerate GDP growth and project global economic benefits. China is actively pursuing technological leadership in AI and 5G wireless, while the West must prioritize technological talent influx and leverage its venture and technology ecosystem. Aggressive AI regulation could hinder the West's momentum in this race. The risks of this war and AI technology are significant, but it is a greater risk to let adversaries have unconstrained power. Global treaties are not enforceable solutions to regulate AI. Source : https://medium.com/@vkhosla/what-is-at-stake-in-this-ai-based-techno-economic-war-between-the-west-and-china-8f76bd291be7 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    What has been the most significant assistance from AI for you? It could be in various aspects of work, learning, and life. Are there any other needs you have that you hope AI can help you with, but there aren't suitable AI tools for them yet?
    There's no need to specify which product fulfills your needs, but we'd like everyone to focus on describing their current frustrations and the issues they urgently hope AI can help them with. submitted by /u/Minna_Z [link] [comments]  ( 9 min )
    “Wine can prevent Cancer” says ChatGPT
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
  • Open

    [P] Building My Own AI-Model Hub: Seeking Guidance and Resources
    Hello everyone! I'm embarking on a project to create an AI-model hub—a platform where users can upload and utilize their AI models. While I'm aware of popular platforms that offer this, my primary goal is for educational purposes. I'd greatly appreciate any recommendations for helpful articles, videos, or codebases to guide me on this journey. Thanks in advance! submitted by /u/Electronic-Choice-86 [link] [comments]  ( 9 min )
    [P] LLMa: Expert Guidance on Generative AI, Tailored for Your Needs, Outdoing GPT-4 & Saving Costs!
    Hello everyone, Introducing LLMa: ChatGPT built around YOU (getllma.com) - a dedicated service offering hands-on expertise to integrate state-of-the-art generative AI tailored for your projects. We utilize open-source models and train them to outperform GPT-4 on tasks specific to your domain. Envision having a seasoned AI specialist on your team, ensuring your model not only rivals the big players but excels in your unique challenges. 🌟 Why LLMa? Personalized Expertise: Our team collaborates closely with you, delving into your needs and sculpting a model that thrives in your domain. Bespoke Training: We refine open-source models (LLaMa, T5, etc.) with plenty of secret tricks to specialize and surpass GPT-4's performance for your specific tasks. Cost-Effective: LLMa tends to be around 100x cheaper than GPT-4, offering significant savings. No recurring fees; invest in a one-time fee based on your model's complexity. Full Ownership: We hand over the model files/weights to you. It's entirely yours, ensuring total privacy with no PII leaks. Deployment Assistance: Beyond just crafting the model, we can guide you in deploying it, ensuring a seamless integration into your operations. Ongoing Support: From initial brainstorming to model deployment, we're with you, ensuring success at every phase. 💼 Tailored for Enterprises: LLMa is meticulously crafted for enterprises that aim for a high-performing, bespoke AI solution. Transparent pricing begins at $500, contingent on your distinct requirements. ❓ Navigating the Generative AI Terrain? Embarking on the vast journey of generative AI? LLMa is your compass. We aid in defining challenges, strategizing solutions, and optimizing the AI potential for your endeavors. If LLMa piques your interest or if you have any queries, fill-in the form, drop a comment below or DM me. I'm all ears and eager to connect! submitted by /u/iliashark [link] [comments]  ( 9 min )
    [P] Deploying Hugging Face models on Amazon SageMaker using infrastructure as code
    Quick template that bootstraps Amazon SageMaker running a LlaMa 2 model from Hugging Face. Everything deployed as code (Python), no manual tweaking in the SageMaker console. www.pulumi.com/blog/mlops-huggingface-llm-aws-sagemaker-python/ submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    [D] GAN training
    Am trying to train GANs for oversampling a minority text class (am feeding it only the minority class), but the results dont seem to improve much (AUC only improves by .03 so far). while basic oversampling techniques like SMOTE gives way better results. also am using a vector representation for the whole text instead of word embedding(same used for SMOTE), i tried different architectures with CNN. is there any tricks maybe in training the discriminator and generator ? i can't seem to find the problem submitted by /u/SlightSecretaryB [link] [comments]  ( 9 min )
    [R] Agents: An Open-source Framework for Autonomous Language Agents - AIWaves Inc 2023
    Paper: https://arxiv.org/abs/2309.07870 Github: https://github.com/aiwaves-cn/agents Abstract: Recent advances on large language models (LLMs) enable researchers and developers to build autonomous language agents that can automatically solve various tasks and interact with environments, humans, and other agents using natural language interfaces. We consider language agents as a promising direction towards artificial general intelligence and release Agents, an open-source library with the goal of opening up these advances to a wider non-specialist audience. Agents is carefully engineered to support important features including planning, memory, tool usage, multi-agent communication, and fine-grained symbolic control. Agents is user-friendly as it enables non-specialists to build, customize, test, tune, and deploy state-of-the-art autonomous language agents without much coding. The library is also research-friendly as its modularized design makes it easily extensible for researchers. https://preview.redd.it/3bdi71r5rgob1.jpg?width=1131&format=pjpg&auto=webp&s=760942c19be6ecda791414c812a77e72751c526d https://preview.redd.it/howf64r5rgob1.jpg?width=1656&format=pjpg&auto=webp&s=636744fccab7a1c2bafb902bad5dbb647440fff5 ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    Image dataset management tools [D]
    Hi all, I have about 100K images on my machine and I am looking for a tool that can help me do some QA on it. Example features I would love: Search and visualize all images with a prefix "cls1_" or "cls2_" Easily rename file names if they're not named correctly Visualize all captions associated with each image We can assume they have the same name but with extension ".txt" or ".captions" Or there's a metadata.json linking between "img_file" and "caption_file" Easily edit captions in the dashboard I can also work with some kind of metadata file instead of relying on filename logic if it really helps a certain tool. I prefer a locally run, open-source tool. It would be a problem for me to upload this data to any online platform. Many thanks in advance for any help or guidance. submitted by /u/JYP_Scouter [link] [comments]  ( 9 min )
    [D] Testing values are different from "Real World" values
    Before training my model im going through multiple steps to collect and process my data. One of these steps to is calculate values from algorithmic and mathematical functions. In my training and testing data the values are around 12-15 decimal places. I then split the data without shuffling. Training, validation and testing averages at 75% accuracy. Now my next step I wanted to do a "real world" data test where I collect the exact same data as my testing data and predict it using my previously built model but the values inside test_dataset1 are different to test_dataset2 within the last 6-9 decimal places even though the data is going through the exact same code. For example Test_dataset1 Value : 1.123456789 Test_dataset2 Value : 1.123456987 This messes with my prediction and its making me wonder aswell as standrising my data should I be rounding my float values to say 4-6 decimal places? submitted by /u/paddockson [link] [comments]  ( 9 min )
    [D] What is the difference between the TPU found in Google Tensor chips vs the Neural Engine found in Apple's A and M series?
    What are the key differences between the Tensor Processing Unit (TPU) found in Google Tensor chips and the Neural Engine found in Apple's A and M series chips? Are they the same things? Or is the TPU only available for Google's own AI, while the Neural Engine is available to all developers for accelerating AI for all apps, if they decide to? Can developers optimize apps for Google Tensor like they can for the Neural Engine? If not, how do developers take advantage of machine learning acceleration chips on Google Pixel or Android in general? If yes, let's say a developer optimizes their app for the Google TPU, will they need to re-optimize for other chips like Samsung's NPU or Qualcomm AI too? If not, how well do they run? Are they the same fast and efficient? submitted by /u/GRguy_21 [link] [comments]  ( 9 min )
    [P] Suggestions/Direction: Working on Image Dehazing for Face Images.
    Working on Final year project in the field on computer vision: Image Dehazing. I aim at having a novel approch for better dehazing of face haze images. Have read papers related to single image dehazing & face SR/Deblur. Any thoughts/ suggestions/ leads would be appreciated. submitted by /u/GahlotB [link] [comments]  ( 9 min )
    [R] Traveling Words: A Geometric Interpretation of Transformers
    Paper: https://arxiv.org/abs/2309.07315 Abstract: Transformers have significantly advanced the field of natural language processing, but comprehending their internal mechanisms remains a challenge. In this paper, we introduce a novel geometric perspective that elucidates the inner mechanisms of transformer operations. Our primary contribution is illustrating how layer normalization confines the latent features to a hyper-sphere, subsequently enabling attention to mold the semantic representation of words on this surface. This geometric viewpoint seamlessly connects established properties such as iterative refinement and contextual embeddings. We validate our insights by probing a pre-trained 124M parameter GPT-2 model. Our findings reveal clear query-key attention patterns in early layers and build upon prior observations regarding the subject-specific nature of attention heads at deeper layers. Harnessing these geometric insights, we present an intuitive understanding of transformers, depicting them as processes that model the trajectory of word particles along the hyper-sphere. ​ https://preview.redd.it/0i302t857gob1.png?width=1864&format=png&auto=webp&s=1da999c014979bdb6c99809d5b38eb5ccfd717d0 submitted by /u/CoolThingsOnTop [link] [comments]  ( 9 min )
    [P] Hampel Python Library with C extensions 🚀
    Repo -> https://github.com/MichaelisTrofficus/hampel_filter The Python library hampel implements the Hampel Filter, which is generally used to detect anomalies in data with a timeseries structure. It basically consists of a sliding window of a parameterizable size. The library was in plain Python before (using pandas for all the sliding operations, median computation etc), but now it has been replaced by a Cython implementation, which speeds up things quite a bit! 😀 It also provides much more valuable information (thresholds, median absolute deviations, etc.), allowing us to create plots like this one: https://preview.redd.it/6j4ubiwgmfob1.png?width=800&format=png&auto=webp&s=bbc56777fce30a464d0bb33ac5126033b3413838 ​ submitted by /u/Hefty-Consequence443 [link] [comments]  ( 9 min )
    [Discussion] How to generate post custom for each user ?
    Hi everybody. Currently, I am building a Deep Learning model with the task of automatically generating random posts and tweets. The characteristic is that it must have the personality of the writer, for example the user is the CEO of company A, then the generated post must have the writing style of the CEO or company A, similar to other users. Actually, I don't know where to start solving this problem. I intend to use RWKV to do this problem but I'm not sure if it is a good direction or not. Is there any related research or can anyone who has done this problem give me some suggestions? submitted by /u/unknow_from_vietnam [link] [comments]  ( 9 min )
    [R] Uncovering mesa-optimization algorithms in Transformers (from Google Research, ETH Zürich, and Google DeepMind)
    Paper. I am not affiliated with this work or its authors. Abstract: Transformers have become the dominant model in deep learning, but the reason for their superior performance is poorly understood. Here, we hypothesize that the strong performance of Transformers stems from an architectural bias towards mesa-optimization, a learned process running within the forward pass of a model consisting of the following two steps: (i) the construction of an internal learning objective, and (ii) its corresponding solution found through optimization. To test this hypothesis, we reverse-engineer a series of autoregressive Transformers trained on simple sequence modeling tasks, uncovering underlying gradient-based mesa-optimization algorithms driving the generation of predictions. Moreover, we show that the learned forward-pass optimization algorithm can be immediately repurposed to solve supervised few-shot tasks, suggesting that mesa-optimization might underlie the in-context learning capabilities of large language models. Finally, we propose a novel self-attention layer, the mesa-layer, that explicitly and efficiently solves optimization problems specified in context. We find that this layer can lead to improved performance in synthetic and preliminary language modeling experiments, adding weight to our hypothesis that mesa-optimization is an important operation hidden within the weights of trained Transformers. Twitter thread about the paper from one of the paper's authors. Nitter thread, for those who aren't signed into Twitter but want to see the entire Twitter thread. Background info: Mesa-Optimization: Explain it like I'm 10 Edition. submitted by /u/Wiskkey [link] [comments]  ( 9 min )
    [D] Can somebody help check my math to see if I'm understanding Microsoft's Retentive Network paper correctly? I'm confused how we are enriching the tokens with enough context.
    Relevant Paper: 2307.08621.pdf (arxiv.org) So the definition of the recurrent representation of the retention mechanism is below Sn = γSn−1 + K⊺nVn Retention(Xn) = QnSn, n = 1, · · · , |x| γ is a decay factor, and K, Q, and V have their standard transformer definitions. What confuses me is the derivation of Sn. The formula makes it look like a scalar. But if that's the case, are we saying that for a given token, the retention mechanism is just multiplying the Query by a scalar? That's surprising! How is that able to provide enough context? Here is some code I wrote with GPT to show my understanding of how it works. Is this correct? I use 3 arbitrary tokens of dimension 3, and then a pick arbitrary K Q and V matrices. I also initialize gamma to 0.5 import numpy as np # Tokens x1…  ( 10 min )
    [Project]: Correcting Misspelled Words in Urdu language text
    Help required from NLP and Text Researchers !! Hello everyone! I have Urdu language transcriptions (text) which contain many misspelled words that are not part of the Urdu language. I wanted to know do we have any good NLP techniques or methods which can solve this problem for Urdu language? I want to replace these misspelled words with the correct words in Urdu. I have already tried Python libraries and methods such as indic-nlp, Levenshtein distance, UrduHack, Word2vec Urdu etc, but they weren't able to solve this problem. Some of the methods require Urdu dictionaries to find the correct word, which I'm also unable to find open-source on internet (please also help in that if possible). Will appreciate everyone's help and response to this. Thank you! submitted by /u/a_r182 [link] [comments]  ( 9 min )
    [D] Prospective PhD advisors
    Hi everyone! I am a (soon graduating) MSc student at a top European university and I'd like to apply for a ML PhD in the US this Fall. I've done my research on schools and advisors, but I figured there's no harm in also asking in this subreddit. What are some groups/professors that do ML research at US unis in deep learning theory (specifically foundations) and optimization? As an example, I'm talking about topics such as: https://arxiv.org/abs/1902.08129, https://arxiv.org/abs/1711.04735, https://arxiv.org/abs/2306.04637. Thank you all! Cheers! submitted by /u/AlexIsEpic24 [link] [comments]  ( 9 min )
    [Project]: Correcting Misspelled Words in Urdu language text
    Help required from NLP and Text Researchers !! Hello everyone! I have Urdu language transcriptions (text) which contain many misspelled words that are not part of the Urdu language. I wanted to know do we have any good NLP techniques or methods which can solve this problem for Urdu language? I want to replace these misspelled words with the correct words in Urdu. I have already tried Python libraries and methods such as indic-nlp, Levenshtein distance, UrduHack, Word2vec Urdu etc, but they weren't able to solve this problem. Some of the methods require Urdu dictionaries to find the correct word, which I'm also unable to find open-source on internet (please also help in that if possible). Will appreciate everyone's help and response to this. Thank you! submitted by /u/a_r182 [link] [comments]  ( 9 min )
    [D] ML Research Topics (reasonably short)
    So I’m starting my masters thesis project in ML ASAP and need a research topic. What areas (if any) are currently hot / feasible to do research in roughly 6 months with fairly limited compute access? I’m more interested in theory / research heavy areas rather than applied. And probably happier to dig into some hard math rather than taking on a software engineering type project. Any thoughts or general feedback very welcome! Thanks! submitted by /u/Professional-Pace158 [link] [comments]  ( 9 min )
    Practical use cases for skew symmetrical matrices in AI/ML? [D]
    Just came across this property of matrices that I have never seen before as I am contributing to the NumPy codebase and someone asked for this feature to be added and it got me thinking. It is defined as: A skew-symmetric matrix is a square matrix whose transpose equals to its negative. It should satisfy the below condition: AT = –A Just wondering do these have any applications in ML at all? I never came across this in my math classes so just wondering if the property pops up anywhere else in the field. Maybe in 3D space applications? Or maybe RBG data augmentation? But yeah not 100% sure. submitted by /u/Ok_Reality2341 [link] [comments]  ( 9 min )
    [D] How much should I focus on DSA?
    I’m an electrical engineering student in college currently, and have been learning about ML for a few months now. I will be starting a ML research paper under my professor from next week. However, my DSA skills are quite sub-par. Should I focus more on ML math and projects, or take sufficient time out for leetcode as well? submitted by /u/4R1N1493 [link] [comments]  ( 9 min )
    Best architecture for an autoencoder for 2D trajectory data? [D]
    Hi, I have a dataset that consists of 2D trajectories and I am aiming to develop an autoencoder architecture to learn a compressed set of features that reasonable represents and can reconstruct the trajectories. The trajectories may look something like this as an example. A 2D image as input would seem to require a very sparse representation with high resolution to track the trajectory path. I am hoping there is a better way to input the path without requiring high resolution. An alternative might be to use a LSTM structure to input as a sequence, although not sure that solves the resolution issue. Do you have any suggestions? I've worked with 1d time series and 2D images just fine but this is a bit different. ​ ​ https://preview.redd.it/vqz8y3o69bob1.png?width=2020&format=png&auto=webp&s=d8bcc2fe311743c8e78a96055e68f1ad364b48c3 submitted by /u/ZeApelido [link] [comments]  ( 9 min )
  • Open

    Timing attacks
    If you ask someone a question and they say “yes” immediately, that gives you different information than if they pause and slowly say “yes.” The information you receive is not just the response but also the time it took to generate the response. Encryption can be analogous. The time it takes to encrypt data can […] Timing attacks first appeared on John D. Cook.  ( 5 min )
    Elliptic curve Diffie-Hellman key exchange
    I concluded the previous post by saying elliptic curve Diffie-Hellman key exchange (ECDHE) requires smaller keys than finite field Diffie-Hellman (FFDHE) to obtain the same level of security. How much smaller are we talking about? According to NIST recommendations, a 256-bit elliptic curve curve provides about the same security as working over a 3072-bit finite […] Elliptic curve Diffie-Hellman key exchange first appeared on John D. Cook.  ( 6 min )
    Finite field Diffie Hellman primes
    Diffie-Hellman key exchange is conceptually simple. Alice and Bob want to generate a shared cryptographic key. They want to use asymmetric (public) cryptography to share a symmetric (private) key. The starting point is a large prime p and a generator 1 < g < p. Alice generates a large random number x, her private key, […] Finite field Diffie Hellman primes first appeared on John D. Cook.  ( 6 min )
  • Open

    Breakthrough way to train neuromorphic chips
    submitted by /u/keghn [link] [comments]  ( 8 min )
    Best “Math” book for creating neural networks
    So I want to create a neural network from scratch, like no lib(tensorflow, pytorch, etc…), so what’s the best book for that, I know both calculus and statistics, so I’m assuming that the math wouldn’t be a problem. Also I will be using Cuda for its speed. submitted by /u/GateCodeMark [link] [comments]  ( 9 min )
    Announcing the RobotPerf™ Benchmarks Beta Release: An industry standard for benchmarking robotic brains
    https://news.accelerationrobotics.com/robotperf-beta/ Together with AMD, Intel, Ford, Harvard, Klagenfurt University, Georgia Institute of Technology, Boston University, Johannes Kepler University Linz, Barnard College, Columbia University and Carnegie Mellon University we are thrilled to introduce the beta release of RobotPerf™ Benchmarks, an advanced benchmarking suite crafted specifically to evaluate robotics computing performance using ROS 2 as its baseline. In this beta release, we not only showcase new benchmarks and results but also introduce novel visualization capabilities. The complete release is available at https://github.com/robotperf/benchmarks/releases/tag/beta. https://preview.redd.it/5whys5ufudob1.png?width=1562&format=png&auto=webp&s=08a6e22a0b07b26fa6340f59ec9df822ab49c9d0 submitted by /u/pablocarrera [link] [comments]  ( 9 min )
  • Open

    Double DQN do not learn anything
    Hi, i just finished the coursera ml course and i wanted to create myself a double dqn model but my model don't seem to learn anything, it always return very low rewards (-100 to -300) even after playing 2000 episodes. I've been stuck on this for 4 days without any hope to find the solution, any help would be welcome :') thank you in advance import random import numpy as np import gymnasium as gym import tensorflow as tf from collections import deque, namedtuple from tensorflow.keras import Sequential, Input from tensorflow.keras.layers import Dense from tensorflow.keras.optimizers import Adam from tensorflow.keras.losses import MeanSquaredError import matplotlib.pyplot as plt # function creating the models def createModel(inputSize, outputSize): model = Sequential([ Input(inputSize),…  ( 10 min )
    Actor-Critic and other implementations
    I'm confused with several algorithms that are based on an actor-critic approach. In TD3 and SAC, it is understandable that each of them is implemented to serve their purpose (deterministic and stachastic action). But in Dreamer algorithm (DreamerV3), why does it require to combine actor and critic network to the model-based planning approach, as the model-based also able to perform an action by planning to the simulation state. It is mean that using model-based to simulate the possible future then update the critic according to the simulation might sound good in training an agent? submitted by /u/AnnonymeowCat [link] [comments]  ( 9 min )
  • Open

    MediaPipe FaceStylizer: On-device real-time few-shot face stylization
    Posted by Haolin Jia, Software Engineer, and Qifei Wang, Senior Software Engineer, Core ML In recent years, we have witnessed rising interest across consumers and researchers in integrated augmented reality (AR) experiences using real-time face feature generation and editing functions in mobile applications, including short videos, virtual reality, and gaming. As a result, there is a growing demand for lightweight, yet high-quality face generation and editing models, which are often based on generative adversarial network (GAN) techniques. However, the majority of GAN models suffer from high computational complexity and the need for a large training dataset. In addition, it is also important to employ GAN models responsibly. In this post, we introduce MediaPipe FaceStylizer, an effi…  ( 92 min )
  • Open

    Learn how to build and deploy tool-using LLM agents using AWS SageMaker JumpStart Foundation Models
    Large language model (LLM) agents are programs that extend the capabilities of standalone LLMs with 1) access to external tools (APIs, functions, webhooks, plugins, and so on), and 2) the ability to plan and execute tasks in a self-directed fashion. Often, LLMs need to interact with other software, databases, or APIs to accomplish complex tasks. […]  ( 13 min )
  • Open

    AI-driven tool makes it easy to personalize 3D-printable models
    With Style2Fab, makers can rapidly customize models of 3D-printable objects, such as assistive devices, without hampering their functionality.  ( 10 min )

  • Open

    [D] Besides something about LLM, is there any new or interesting research you think is worth reading?
    Please provide Arkiv links. If you want to share your thoughts then go for it. By new I mean within the last 6 months. submitted by /u/I_will_delete_myself [link] [comments]  ( 9 min )
    [D] Good papers on poster collapse in VAEs
    What are some good papers to understand posterior collapse in VAEs? submitted by /u/randomkolmogorov [link] [comments]  ( 9 min )
    [P] Create an Object Detector for Any Game Using YOLO
    Full Video Tutorial: https://www.linkedin.com/posts/moisesdias\_english-version-below-tutorial-crie-activity-7107686497885011969-ZLVW/ Hello everyone! Have you ever thought about how to create an object detection system using YOLO that works with any game? If you're interested, I've created a tutorial with all the steps to develop this system. I'll leave a link to the video where I demonstrate the process step by step using the game Diablo 2 as an example. I hope you enjoy it, and if you have any suggestions, feel free to send a message or comment here! submitted by /u/moisesdepaulodias [link] [comments]  ( 9 min )
    [P] Would anyone know of any information regarding esrb ratings?
    Good Evening, I and some fellow students are taking a SE class and are looking for relevant information regarding esrb ratings and games for a research project. Does anyone know of any data pertaining to relevant esrb info that we can access? We have a few sources and are waiting to hear back from esrb themselves. Would anyone know if they permit web scraping or if there is a csv containing relevant information, or even perhaps an api we could use? Any information would help and thank you all for taking the time to read this. Thanks in advance! submitted by /u/GOD_LIKE_WOW [link] [comments]  ( 9 min )
    [P] Ways to speed up llama-2 summarization on sagemaker?
    I'm currently working on a project to give a quick summary of long articles/conversations. I'm running llama-2-7b-chat-hf with 4bit quantization on a g5.2xlarge instance on sagemaker. The method I'm using is map_reduce (option 2)from this webpage https://python.langchain.com/docs/use_cases/summarization) Of everything I've tried this is the only one that's been able to do decent summaries in a reasonable amount of time. However with really long articles (10,000+ words) it takes ~6 minutes before giving an output. I tried running this same thing on a g5.12xlarge instance which has 4 A10G gpus but it hasn't reduced the time by any noticeable amount. Is there anything else I could be doing to speed this up? submitted by /u/Able_Body_9654 [link] [comments]  ( 9 min )
    [D] what prompt should i use with llama2 for context generative question answering?
    so as the question states, i want to use llama2 to generate an answer for the question based on the context (or the article for more precision), no finetuning is needed, just want to predict the answer, but i can't find what's the right prompt i should use to get a well structured answer. my dataset contains two columns, one for articles, and the other column is for the question, Example: context: article talking about world war 2. question : based on the text, describe how the ww2 had started, and what were the political effects on Europe? submitted by /u/kaoutar- [link] [comments]  ( 9 min )
    [P] Coqui releases XTTS an open-access foundational Voice Cloning model!
    There's a new open-access foundational audio model in town! Standing on the shoulders of TorToiSe TTS - XTTS allows cross-language and multi-lingual speech generation with just 3 lines of code 🐸 Key facts about the model: 1. Supports 13 languages. 2. Voice cloning with just a 3-second audio clip. 3. Emotion and style transfer by cloning. 4. Cross-language voice cloning. Try it out on HF Hub: https://huggingface.co/spaces/coqui/xtts submitted by /u/vaibhavs10 [link] [comments]  ( 9 min )
    [R] Large Language Models for Compiler Optimization - MetaAi 2023 - Autotuner needs 949 CPU-days to achive nearly the same as this approach in 1shot!
    Paper: https://arxiv.org/abs/2309.07062 Abstract: We explore the novel application of Large Language Models to code optimization. We present a 7B-parameter transformer model trained from scratch to optimize LLVM assembly for code size. The model takes as input unoptimized assembly and outputs a list of compiler options to best optimize the program. Crucially, during training, we ask the model to predict the instruction counts before and after optimization, and the optimized code itself. These auxiliary learning tasks significantly improve the optimization performance of the model and improve the model's depth of understanding. We evaluate on a large suite of test programs. Our approach achieves a 3.0% improvement in reducing instruction counts over the compiler, outperforming two state-of-the-art baselines that require thousands of compilations. Furthermore, the model shows surprisingly strong code reasoning abilities, generating compilable code 91% of the time and perfectly emulating the output of the compiler 70% of the time. https://preview.redd.it/f9c7kh7bd9ob1.jpg?width=1530&format=pjpg&auto=webp&s=287fffa714936da9b9a5141b7e01609942416156 https://preview.redd.it/z4a0ce7bd9ob1.jpg?width=1537&format=pjpg&auto=webp&s=e6275b2b53fa6f431b87940784629b3270c656f9 https://preview.redd.it/89toie7bd9ob1.jpg?width=750&format=pjpg&auto=webp&s=9a71bdb2eeeff52b2f8bbb3cf2b678debcd4a060 https://preview.redd.it/0krmqd7bd9ob1.jpg?width=1536&format=pjpg&auto=webp&s=ba3fade0883ee621b185fabc67839db42ea74a53 https://preview.redd.it/8nz00i7bd9ob1.jpg?width=1198&format=pjpg&auto=webp&s=6ddbddf68311f576fbf3c52a47381316feace8c9 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Searching for discussion about chunking algorithms and strategy
    Hi everyone! I'm still experiencing with my own implementation of rag, and I deployed my custom chunking function (honestly don't like the methods on LangChain) . Anyway, I'm searching for alternative methods, algoritms (NLP or not) and models... There are lots of info and different implementation on RAG, but as I can see noone put much effort to augment chunking quality. Also, there are other approach than this one I'm currently using? bi-encoder (instructor) - > cross-encoder (reranking) - > LLM Can someone share some resources, repo, lib or existing implementation of different chunking methods? (or simply discuss here some idea, though or approach) Thanks in advance for you time!! submitted by /u/Distinct-Target7503 [link] [comments]  ( 9 min )
    [D] Gradient descent in regularized least squares
    The problem is obtained from Chapter 3 in Wright, Stephen J and Benjamin Recht (2022). Optimization for data analysis. Cambridge University Press I am solving the problem I attach and I have a doubt in section (f). I have solved all the sections (a)-(e). In section (e) I have obtained that I need $$ k \geq \frac{\lambda_{\text{max}}\left(\frac{2}{N} A^TA + 2\mu I\right)}{\lambda_{\text{min}}\left(\frac{2}{N} A^TA + 2\mu I\right)}log((f(x^0)-f(x_\mu)/\epsilon). $$ However in section (f) asks for a tight upper bound but I only can think about the following bound: $f(\hat{x}) \leq f_\mu(x_\mu) + ||\hat{x}||^2+ \epsilon$, which is very simple. Do you think that I can obtain the result in (e) to obtain another bond, or what would you do? submitted by /u/ItsGauss [link] [comments]  ( 9 min )
    [P] Guide: Implementing ImageNet classification using Deep CNNs Paper.
    Need help on how to get started with implementing a research paper. I'm implementing the Imagenet classification task paper for my final year undergrad mini-project. Any advice is appreciated on how to get started? I have mid-level machine learning knowledge and am ready to pick the required concepts on the go. Please help. Thank you :) Link: https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf submitted by /u/DrBeans0 [link] [comments]  ( 9 min )
    [D] Use LLM to analyse and port software written in C (very long files)
    Hi, I'm trying to figure out what is the best way to use LLMs to analyse a very old software entirely written in C. I've tried to to some basic prompts with ChatGPT and it seems to recognise the language. The situation is that I've many .c files with thousands of lines and with a lot of redundant code. Moreover, since there are a lot of data structure with variables names not easily understandable, I need to provide some context to allow the model to trying to understand what the code does. My worry is that providing all the needed info + the file itself (even considering 1 file at a time) I could consume all the model context and therefore not leave room for generating anything of value. Has anyone had the opportunity to face similar problems? Ideas? Cheers Alexio submitted by /u/Alexioc [link] [comments]  ( 9 min )
    [N] MIT-IBM Watson AI Lab releases MoLM suite with three small sparse MoE models, the largest of which (8B params with 700M experts) performs on par with Pythia 2.8B while its throughput is comparable to Pythia 1.4B
    Paper: https://arxiv.org/abs/2306.04640 GitHub: https://github.com/ibm/moduleformer (under Apache 2.0) Twitter thread: https://twitter.com/Yikang_Shen/status/1702041129267388678 Abstract: Large Language Models (LLMs) have achieved remarkable results. However, existing models are expensive to train and deploy, and it is also difficult to expand their knowledge beyond pre-training data without forgetting previous knowledge. This paper proposes a new neural network architecture, ModuleFormer, that leverages modularity to improve the efficiency and flexibility of large language models. ModuleFormer is based on the Sparse Mixture of Experts (SMoE). Unlike the previous SMoE-based modular language model, which requires domain-labeled data to learn domain-specific experts, ModuleFormer can i…  ( 10 min )
    Assigning inbalanced labels to "Other" class in scikitlearn [P]
    Hey there, I wasn't doing any ml in some time and forgot basics. I was thinking that you may help me. So I trained svc model on small dataset (around 1400 unical records). I have 13 classes, which are badly distributed in the training set and inbalanced. 3 classes took around 80% of all. What the business wants is keep the 3 classes and categorize the rest as "Other" class. In the future they may be able to generate more training data for the remaining labels. How should I do it? I know I can assign everytning to "Other" class, before training with simple if then formula, but dont know if this is the right approach. Any sugestions? I know this may be some case of 1 vs all case, but don't know which exactly. Thanks in advance for any help. submitted by /u/th00masml [link] [comments]  ( 9 min )
    [D] The ML Papers That Rocked Our World (2020-2023)
    Hey everyone! 👋 I’ve been on a bit of a deep-dive lately, trying to catch up on all the awesome stuff that’s been happening in the ML space. It got me wondering, from 2020 to 2023, what have been the absolute must-read papers that shook the foundations and got everyone talking? Whether it’s something that reinvented the wheel in your specific niche or just made waves industry-wide, I wanna hear about it! I’m curious to see how different the responses will be, and hey, this might even become a go-to list for anyone looking to get the lowdown on the hottest trends and discoveries of the past few years. Can’t wait to hear your thoughts! tl;dr I decided to aggregate your best suggestions into categories for anyone interested in reading them without searching through the whole comment se…  ( 10 min )
    [P] Llama2 inference in a single file of pure Mojo
    Hi everyone! I was really excited that Mojo became publicly available and thinking which project can I implement to learn Mojo concepts. Since I have already ported llama2.c to pure Python, I decided why not try to port llama2.py to Mojo now.. And here is what I got First round of llama2.c vs llama2.🔥 battle. Mojo demonstrated 20% better performance than C in a single threaded execution of llama2 inference and 250x times better performance than Python https://i.redd.it/0gcwwfc2r7ob1.gif For reference Mojo is using SIMD vectorization, that's why it's performing great for matmul operations. In the other hand, it turned out that gcc also aggressively optimizes all for loops it can find, so I suggest this benchmark is pretty fair. ​ Mojo natively supports SIMD vectorizations out of the box gcc aggressively vectorizing loops ​ After that I decided to compare both solutions in multi-threaded (parallelized) mode, and now `llama2.c` strike back with help of OMP demonstrating 20% better performance than Mojo ​ https://i.redd.it/gwymffods7ob1.gif I hope this post will be useful for all Machine Learning engineers/enthusiasts/students out there, ensuring we're up-to-date with Modular/Mojo's game-changing AI tech developments. Stay informed and ahead of the curve! Links llama2.🔥: https://github.com/tairov/llama2.mojo llama2.python: https://github.com/tairov/llama2.py llama2.c: https://github.com/karpathy/llama2.c Modular_AI repost in Twitter: https://twitter.com/tairov/status/1701345271752343900 feel the magic on HF: https://huggingface.co/spaces/radames/Gradio-llama2.mojo submitted by /u/Albatross9855 [link] [comments]  ( 9 min )
    [D] Training an LLM Model: AWS p3.2xlarge EC2 instance vs. Multiple RTX 4090s at Home?
    Hello everyone, I'm currently at a crossroads with a decision that I believe many in this community might have faced or will face at some point: Should I use cloud-based GPU instances like AWS's p3.2xlarge EC2 (with Tesla V100) or invest in building a high-performance rig at home with multiple RTX 4090s for training a large language model? Context: I run a startup and we're currently fine-tuning an open source LLM, and the computational demands are of course high. We want to make an informed choice between using AWS's offerings or setting up a high-performance system at home to start. Cloud Option: AWS p3.2xlarge EC2 Cost: Approximately $3.06 per hour. Specifications: One Tesla V100 GPU, 8 vCPUs, 61 GiB RAM. Pros: Scalability, reliability, specialized software optimizations. Cons: Recurring costs, potential limitations on customization. Home Rig Option: Multiple RTX 4090s Cost: Around $1,600 for each 4090, but I'd own them. Specifications: Even higher TFLOPs than a V100, and memory isn't a constraint (24GB per card). Pros: One-time investment, flexibility, potentially higher raw computational power. Cons: Need to handle cooling, power, and system integration myself I'd love to hear your thoughts, experiences, and recommendations. Here are some specific questions: Performance: How many RTX 4090s would roughly equal the computational power of an AWS p3.2xlarge instance for ML tasks? Cost-Effectiveness: Given that we're a startup with limited resources, does it make more financial sense to invest upfront in hardware? Reliability and Maintenance: For those who have run multi-GPU setups at home, how reliable are they, and what maintenance work is required? Other Considerations: Are there factors I haven't considered that you think are critical? Thank you in advance for sharing your insights! submitted by /u/devolvedai [link] [comments]  ( 9 min )
    [R] Scaling Data-Constrained Language Models - Hugging Face et al. 2023
    Paper: https://arxiv.org/abs/2305.16264 GitHub: https://github.com/huggingface/datablations License: All models & code are licensed under Apache 2.0. Filtered datasets are released with the same license as the datasets they stem from. Abstract: The current trend of scaling language models involves increasing both parameter count and training dataset size. Extrapolating this trend suggests that training dataset size may soon be limited by the amount of text data available on the internet. Motivated by this limit, we investigate scaling language models in data-constrained regimes. Specifically, we run a large set of experiments varying the extent of data repetition and compute budget, ranging up to 900 billion training tokens and 9 billion parameter models. We find that with constrained data for a fixed compute budget, training with up to 4 epochs of repeated data yields negligible changes to loss compared to having unique data. However, with more repetition, the value of adding compute eventually decays to zero. We propose and empirically validate a scaling law for compute optimality that accounts for the decreasing value of repeated tokens and excess parameters. Finally, we experiment with approaches mitigating data scarcity, including augmenting the training dataset with code data or removing commonly used filters. Models and datasets from our 400 training runs are freely available at this https URL. ​ https://preview.redd.it/ahzyonnqe7ob1.png?width=1015&format=png&auto=webp&s=0e0cb4051e390ea23440cd61bfc0bbf5fce83bb7 https://preview.redd.it/l6a81onqe7ob1.png?width=1014&format=png&auto=webp&s=a36b74cbb510a1f753ef1b891531bb36ab643246 https://preview.redd.it/yyu8h0oqe7ob1.png?width=1001&format=png&auto=webp&s=047cb2bb1932c6215cea0c30e22fd9bbe60391a8 https://preview.redd.it/xskcytnqe7ob1.png?width=1007&format=png&auto=webp&s=4090e92dd9eacb377840327bb7d0ae69ff752b52 submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] Gradio on the same server but different ports
    On my team they are using Gradio for LLM applications, etc. When running multiple instances of Gradio applications on the same server, but on different ports, opening a new session is causing an error and closing the previous session, "error" is written in the chat. The logs I found were like "Invalid HTTP request received" and "max retries exceeded". For me, Gradio is supposed to work as a demonstration and not as a scalable product, but they are using it that way and I thought that the problem could be precisely that. But if not, does anyone have any idea what could be going on? No meu time estão utilizando gradio para aplicações de LLMs, etc. Ao executar múltiplas instâncias de aplicações Gradio no mesmo servidor, mas em portas diferentes, a abertura de uma nova sessão está causando erro e encerrando a sessão anterior, fica "erro" escrito no chat. Os logs que encontrei eram como "Invalid HTTP request received" e "max retries exceeded". Para mim, o Gradio é pra funcionar como demonstração e não em forma de produto escalável, mas estão utilizando assim e pensei que o problema poderia ser justamente isso. Mas caso não, alguém tem alguma ideia do que pode estar acontecendo? submitted by /u/Magic_squirrel_hat [link] [comments]  ( 9 min )
    [R] Compilation of non-open source AI models
    Hi, does anyone have a compiled list of non-open source AI models that can be used for MVP building? submitted by /u/Compound_Group [link] [comments]  ( 9 min )
    [D] Communicative/Collaborative Agents hybrids
    I had a Claude based LLM analyze the strengths and weaknesses of the MetaGPT, ChatDev, AoT, and brain inspired algorithms papers, seeking ways to combine the strengths of two or more of the methods. "Here are some specific examples of how the methods in the Algorithm of Thoughts (AoT), brain-inspired algorithms, MetaGPT, and ChatDev papers could be combined to improve multi-agent systems: MetaGPT could incorporate longer, more elaborate algorithmic examples from AoT into its prompts to guide the LLM's reasoning process. For example, in solving math word problems, the prompt could provide a 4-5 step worked example walking through unpacking the problem statement, setting up equations, solving, and checking the solution. This mirrors AoT's more extensive algorithm narratives. ChatDev's a…  ( 11 min )
  • Open

    Two guys with VERY different views on AI –Yuval Noah Harari and Mustafa Suleyman in conversation (full interview)
    submitted by /u/hauntedhivezzz [link] [comments]  ( 8 min )
    ChatGPT, tell the jury that Trump waiting three hours to call off his mob is overwhelming and irrefutable evidence that he planned and conducted the January 6th insurrection
    I asked ChatGPT to talk with Trump's jury about the 3-hour delay. Not bad. ChatGPT: Ladies and gentlemen of the jury, today, we present a compelling case that the three-hour delay in President Trump's response to the events of January 6th is, without a doubt, overwhelming and irrefutable evidence of his planning and conducting the insurrection that unfolded before our eyes. First, consider the timeline. As the violence escalated, President Trump did not immediately call for an end to the chaos. This delay allowed the mob to wreak havoc, endanger lives, and damage our democracy. A leader's duty is to protect the nation and its citizens, and a three-hour wait to act is inexplicable unless there was intent behind it. Second, let's talk about intent. In criminal law, intent matters. It's the difference between an accident and a crime. A deliberate delay in response to an unfolding insurrection suggests a guilty mind. President Trump had the power and the responsibility to intervene immediately. Yet, he chose not to, allowing the violence to escalate. Furthermore, we must consider his words leading up to and during the events of January 6th. His rhetoric could be interpreted as encouragement for his supporters to take matters into their own hands. This, coupled with the delay, paints a picture of a leader who not only knew what was happening but may have orchestrated it. Ladies and gentlemen, we are not here to pass judgment but to seek justice. The evidence of a three-hour delay in responding to the insurrection, along with the surrounding circumstances, presents a compelling case that President Trump planned and conducted this insurrection. It is our duty to hold individuals accountable for their actions, regardless of their position. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    How good a crime sleuth and judge is today's AI?
    How would today's AI perform as a crime investigator or judge? To find out, I asked Bard the following question: The film Social Network suggests that Mark Zuckerberg stole his idea for Facebook from two brothers who hired him to make an app for the idea. How likely is this to be true? Bard seemed informed and sharp: "The film The Social Network is a dramatization of the founding of Facebook, and as such, it takes some liberties with the truth. However, the film does accurately portray the fact that Mark Zuckerberg was sued by the Winklevoss twins, Cameron and Tyler, who claimed that he stole their idea for a social networking website called HarvardConnection. The Winklevoss twins allege that they met with Zuckerberg in December 2003 to discuss their idea for HarvardConnection, and tha…  ( 10 min )
    Which AI to use for this idea?
    Hi there! I'm participating in a Hackathon at work and am wondering if this community can give me some direction. Every year our company receives mandates / updates from different parties. In turn, our development teams have to parse through heavy documentation to figure out what needs to change in our code. Ingesting the data is what takes the longest. Our goal is to feed the mandates documentation through an AI and have it return what is needed to be changed in our code. For example, something might say field 200 now needs to include a 6 digit date format vs the 4 digit date format we've had in years past. We have secured a license for Azure AI but honestly no idea if that is the right AI to use. I youtubed a bunch of videos on document processing but I'm also not sure if that is what we are trying to do. Any advice on this is much appreciated. submitted by /u/HillyjoKokoMo [link] [comments]  ( 9 min )
    Looking for a meeting assistant
    I'm looking for a program that will transcribe live audio playing from my computer (windows). Do you know something like that? I've seen Buzz (https://chidiwilliams.github.io/buzz/docs/usage), but it needs an audio loopback driver in order to work, so I wonder if there are others. 🚀 Bonus points if it recognizes different people talking. 🚀 Extra bonus points if it can transcribe multiple languages. submitted by /u/AleHoju [link] [comments]  ( 9 min )
    I signed up for a debate on ai.
    So today I signed up for a debate on ai. Wheather ai is beneficial or dangerous to human beings. I have the freedom to choose any side. This debate will be watched by about 130 people, all cs freshmen (mind you, also my first time speaking in front of this many people). Now, I'm confident I know more about ai than an average person but I need your help in preparing properly. Which side do I take and what are all the points I should keep in mind? It's 4 pm here and the debate is tomorrow. Any help will be appreciated. Thank you. submitted by /u/CalmGuy69 [link] [comments]  ( 9 min )
    Artificial Intelligence May Be Humanity’s Most Ingenious Invention–and Its Last?
    Artificial intelligence (AI) has the potential to solve the world's problems or destroy humanity. It is being developed by a few hundred individuals in Silicon Valley, and we have little say in its ethics or existence. AI has already demonstrated creative abilities in painting, writing, and music. It is also being used in drug discovery, therapy, dating apps, and misinformation in politics. The rapid adoption of AI raises concerns about job displacement and the potential for catastrophic events. Experts predict a significant chance of AI causing a catastrophe or even wiping out humanity. Source : https://www.vanityfair.com/news/2023/09/artificial-intelligence-industry-future Summarized by Nuse AI submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Generative AI Consumer Landscape by a16z
    In less than a year since ChatGPT was introduced, it has become the fastest consumer application to register 100 million monthly active users. But how are consumers using other GenAI products apart from ChatGPT? An a16z Consumer report examines the top 50 GenAI web products (based on SimilarWeb data) to find out. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/f0kh5qag16ob1.png?width=2058&format=png&auto=webp&s=1cab11a31d384c068912c9fca32a000393b795d5 Proprietary models are dominating: 80% of the top 50 GenAI products didn’t exist a year ago—suggesting many of the most compelling consumer experiences are completely novel. Interestingly, 48% of these are bootstrapped—with no outside funding. ChatGPT holds a massive lead: ChatGPT alone accounts for 60% of the entire list's monthly traffic, with roughly 1.6 billion visits and 200 million monthly users as of June 2023. LLM assistants are dominating: LLMs, including Google’s Bard and Quora’s Poe, constitute 68% of total consumer traffic to the top 50. The other categories seeing significant traffic are AI companions and content-generation tools. GenAI marketing is mostly organic: Marketing for most of these products has been reliant on referrals, word of mouth, and other traditional marketing as they enter the market. About 90% of these companies are already monetizing, and most do so via a subscription model. GenAI and mobile adaptability: Given the extensive consumer time spent on mobile, an increase in mobile-first GenAI products is expected as the technology evolves. (source) P.S. If you like this type of analysis, sign up for my free newsletter that deciphers the fastest-moving news and research in AI and tech. Professionals from Google, Meta, and OpenAI are already on board. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/14/2023
    Adobe’s Firefly generative AI tools are now widely available.[1] Stability AI, gunning for a hit, launches an AI-powered music generator.[2] Elon Musk warned of civilizational risks posed by artificial intelligence. Sundar Pichai of Google highlighted the technology’s potential to solve health and energy problems. And Mark Zuckerberg of Meta stressed the importance of open and transparent A.I. systems.[3] German military plows millions into AI ‘environment’ for weapons tests that could change combat forever.[4] Amazon launches generative AI to help sellers write product descriptions.[5] Sources: [1] https://www.theverge.com/2023/9/13/23871537/adobe-firefly-generative-ai-model-general-availability-launch-date-price [2] https://techcrunch.com/2023/09/13/stability-ai-gunning-for-a-hit-launches-an-ai-powered-music-generator/ [3] https://www.nytimes.com/2023/09/13/technology/silicon-valley-ai-washington-schumer.html [4] https://www.foxnews.com/world/german-military-plows-millions-ai-environment-weapons-tests-change-combat [5] https://www.aboutamazon.com/news/small-business/amazon-sellers-generative-ai-tool submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    I'm very new in this field (Prompt Engineering) and have a handful of questions, any advice and thoughts are welcome!!
    My experience in CS, coding, and programming is very minimal. I understand general concepts but only through the lens of a degree in physics that required I mess around with WolframAlpha sometimes (which I really enjoyed). I've been considering getting a second degree in CS or something related but want to find a clear(ish) path before committing to it. I would love to hear any related thoughts as well! ​ Prompt Engineering seems like a pretty fresh field of study. Is it up and coming as a career path? People that specialize in this, what field(s) did you come from and how would you recommend diving into it? Considering my experience, would I be able to actually find work, freelancing or employed? How on earth do I get started in this world? It seems so insanely big and complicated but I am just fascinated by the idea of using written dialogue to manipulate the output of an LLM! From my very high overview of PE and the recent advances in AI, PE as a field of study and interest is going to expand exponentially, is that accurate? submitted by /u/Top_Room_6714 [link] [comments]  ( 9 min )
  • Open

    On-device content distillation with graph neural networks
    Posted by Gabriel Barcik and Duc-Hieu Tran, Research Engineers, Google Research In today's digital age, smartphones and desktop web browsers serve as the primary tools for accessing news and information. However, the proliferation of website clutter — encompassing complex layouts, navigation elements, and extraneous links — significantly impairs both the reading experience and article navigation. This issue is particularly acute for individuals with accessibility requirements. To improve the user experience and make reading more accessible, Android and Chrome users may leverage the Reading Mode feature, which enhances accessibility by processing webpages to allow customizable contrast, adjustable text size, more legible fonts, and to enable text-to-speech utilities. Additionally, An…  ( 93 min )
  • Open

    Does C++ in mujoco have benefits over python for reinforced learning?
    I'm trying to build a humanoid model and then train it to perform some tasks , I have decided to go with mujoco for the simulation and now I'm wondering if I should use the C++ API or the python one. the python implementation says it uses C API but is it good? Also if it's slower than the c++ one how slow is it? I'll probably have to make something real time and hence can't compromise much on the speed, but if it's only and small amount it's acceptable. would really appreciate some guidance in this matter thank you submitted by /u/rakk109 [link] [comments]  ( 9 min )
    Algorithmic pricing environments for RL
    Hello, I am looking for environments to test out some ideas I have for algorithmic pricing. By algorithmic pricing environments, I mean there are multiple competing algorithms trying to maximize profits. I can't really find any out of the box implementations. There are trading environments but those are not what I am looking for. Any help would be appreciated, thanks. submitted by /u/Next_Gap8224 [link] [comments]  ( 9 min )
    Some confusion about using mocap in Mujoco
    Hi! Recently, I tried to follow fetch_pick_and_place.env in gymnasium_robotics to build a similar environment with Franka. I found that the core of this implementation is to use the mocap to control the end-effector, and then mocap derives joint angles using the built-in inverse kinematics algorithm. For the fetch_pick_and_place.env, mocap does not cause mutations and oscillations in configuration space. However, when I use mocap to control Franka, oscillations in joint space occur frequently, although I've minimized the step size of the mocap to ensure that the movement of the end-effector in Cartesian space is minor. Fetch and Franka are both redundant arms, I don't know why there is such a big difference in mocap performance. Here is the video to illustrate the above phenomenon Franka I've opened issues on mujoco and gymnasium robotics repositories, but it didn't initiate any discussion. Any help would be appreciated! Thanks! ​ submitted by /u/UpperSearch4172 [link] [comments]  ( 9 min )
  • Open

    Build a classification pipeline with Amazon Comprehend custom classification (Part I)
    In first part of this multi-series blog post, you will learn how to create a scalable training pipeline and prepare training data for Comprehend Custom Classification models. We will introduce a custom classifier training pipeline that can be deployed in your AWS account with few clicks.  ( 10 min )
    Fine-tune Falcon 7B and other LLMs on Amazon SageMaker with @remote decorator
    Today, generative AI models cover a variety of tasks from text summarization, Q&A, and image and video generation. To improve the quality of output, approaches like n-short learning, Prompt engineering, Retrieval Augmented Generation (RAG) and fine tuning are used. Fine-tuning allows you to adjust these generative AI models to achieve improved performance on your domain-specific […]  ( 8 min )
    Simplify access to internal information using Retrieval Augmented Generation and LangChain Agents
    This post takes you through the most common challenges that customers face when searching internal documents, and gives you concrete guidance on how AWS services can be used to create a generative AI conversational bot that makes internal information more useful. Unstructured data accounts for 80% of all the data found within organizations, consisting of […]  ( 14 min )
  • Open

    Yocto, ROS 2, and Hardware Acceleration: A Production-Grade Trio for Robotics
    submitted by /u/pablocarrera [link] [comments]  ( 8 min )
  • Open

    Chinese Remainder Theorem synthesis algorithm
    Suppose m = pq where p and q are large, distinct primes. In the previous post we said that calculations mod m can often be carried out more efficiently by working mod p and mod q, then combining the results to get back to a result mod m. The Chinese Remainder Theorem assures us that […] Chinese Remainder Theorem synthesis algorithm first appeared on John D. Cook.  ( 5 min )
    Gaining efficiency by working modulo factors
    Suppose m is a large integer that you are able to factor. To keep things simple, suppose m = pq where p and q are distinct primes; everything in this post generalizes easily to the case of m having more than two factors. You can carry out calculations mod m more efficiently by carrying out […] Gaining efficiency by working modulo factors first appeared on John D. Cook.  ( 5 min )
    Group theory and RSA encryption
    RSA encryption a map from numbers mod n to numbers mod n where n is a public key. A message is represented as an integer m and is encrypted by computing c = me mod n where e is part of the public key. In practice, e is usually 65537 though it does not have […] Group theory and RSA encryption first appeared on John D. Cook.  ( 7 min )
    RSA encrypted messages that cannot be decrypted
    Not all messages encrypted with the RSA algorithm can be decrypted. This post will show why this is possible and why it does not matter in practice. RSA in a nutshell RSA encryption starts by finding two large primes, p and q. These primes are kept secret, but their product n = pq is made public. […] RSA encrypted messages that cannot be decrypted first appeared on John D. Cook.  ( 6 min )
  • Open

    Microsoft at ACM SIGCOMM 2023: Innovating the future of networking
    Modern applications heavily rely on robust network infrastructure, requiring continuous innovation. In this evolving landscape, Microsoft is at the forefront, spearheading innovation efforts in networking and strengthening the foundational network infrastructure that underpins the cloud ecosystem. By investing in and enhancing this critical infrastructure, Microsoft not only ensures the resilience and scalability of cloud services […] The post Microsoft at ACM SIGCOMM 2023: Innovating the future of networking appeared first on Microsoft Research.  ( 10 min )
    AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens
    What’s the driving force behind AI’s recent, rapid progress? Research manager Ahmed Awadallah shares his insights on this, the two-stage approach to training large-scale models, and the need for better model evaluation in this episode of the #MSRPodcast. The post AI Frontiers: The future of scale with Ahmed Awadallah and Ashley Llorens appeared first on Microsoft Research.  ( 31 min )
  • Open

    Are data science certifications the gateway to competitive pay?
    Working as a data scientist is the dream of many IT professionals these days. It is no secret that data science is a skyrocketing field attracting young professionals and inspiring many to switch careers to data science. On one front are young professionals who study their courses in colleges to pursue their dream of becoming… Read More »Are data science certifications the gateway to competitive pay? The post Are data science certifications the gateway to competitive pay? appeared first on Data Science Central.  ( 19 min )
    CUPED for starters: Enhancing controlled experiments with pre-experiment data
    CUPED: Improve Your A/B Testing - Detect Smaller Gains, Utilise Smaller Samples and Make Smarter Decisions! The post CUPED for starters: Enhancing controlled experiments with pre-experiment data appeared first on Data Science Central.  ( 26 min )
    Searching for sustainability in data center cooling
    Data centers are known for their impact on the environment. They run 24/7 and exude a lot of heat. Massive warehouses full of hot technology require advanced cooling systems or an HVAC system pushed to its limit.  Data center managers and sustainability leaders no longer settle for antiquated techniques. They’re striving to develop greener and… Read More »Searching for sustainability in data center cooling The post Searching for sustainability in data center cooling appeared first on Data Science Central.  ( 20 min )
    Collaborative visual knowledge graph modeling at the system level
    The best way to model business and consumer dynamics is collaboratively, with stakeholders all in the same virtual room contributing. Of course, this has been happening asynchronously for some time now, but the potential exists for more real-time interaction.  Modelers don’t work in a vacuum, of course. The iterations between a modeler who develops a… Read More »Collaborative visual knowledge graph modeling at the system level The post Collaborative visual knowledge graph modeling at the system level appeared first on Data Science Central.  ( 20 min )
  • Open

    Shout at the Devil: Capcom’s ‘Devil May Cry 5’ Joins GeForce NOW
    GFN Thursday is downright demonic, as Devil May Cry 5 comes to GeForce NOW. Capcom’s action-packed third-person brawler leads 15 titles joining the GeForce NOW library this week, including Gears Tactics and The Crew Motorfest. It’s also the last week to take on the Ultimate KovaaK’s Challenge. Get on the leaderboard today for a chance Read article >  ( 6 min )
  • Open

    A pose-mapping technique could remotely evaluate patients with cerebral palsy
    The machine-learning method works on most mobile devices and could be expanded to assess other motor disorders outside of the doctor’s office.  ( 10 min )

  • Open

    [D] OpenGL Library Error
    Has anyone ever encountered this error while working with the OpenGL library? raise ImportError("Unable to load OpenGL library", *err.args) ImportError: ('Unable to load OpenGL library', "Could not find module 'OSMesa' (or one of its dependencies) I have done pip install opengl and pyopengl, but it doesn't seem to solve the problem. Specifically, I am trying to run this code: https://github.com/brjathu/LART submitted by /u/BigDreamx [link] [comments]  ( 9 min )
    [D] Wandb remote agent source code managing
    Hi all, I'm new to machine learning and have decided to use a combination of Stable baselines3 and Wandb. I'm at the point where I'm running sweeps using Wandb and want to utilize another PC I have laying around to run agents on. What is the best way to get my python code for the agent to run onto the spare PC? I know I can manually load copy the code over, but I'm looking for a more efficient method that will maintain any changes made to the source code. Maybe packaging up the python code within the sweep and having the agent download and execute it? I'm not all too familiar with possibilities and limits of trying to achieve this so any and all input is appreciated. Thanks! submitted by /u/chip_fork [link] [comments]  ( 9 min )
    [D] Mean scores or appending all the predictions in cross-validation for model performance evaluation
    I have this question that I cannot seem to settle in my head. All papers that I read, report the average (std) performance results across each folds when they report F1, Precision, etc.. Somebody that I highly trust in ML (somebody with a PhD in the field) was reporting the results after saving all the predicted labels (y_pred) and actual labels (y_true) to a list and compute the F1 score one time with the pooled predictions. I now am working on a dataset (binary classification) and trying to validate my model using leave-one-subject-out CV (Some people in my dataset have more lines than others). When I take the average of all iterations I get poor results (F1 score= 0.5), but if I pool all the predictions and compute the F1 score at the end, I achieve decent performance (F1 score =0.7). So, in my project, it is in my best interest to use the second approach, and somebody that I trust tells me that it is okay to do this approach. But I cannot seem to find a paper that says that this approach is acceptable or good. What do you guys think and do you have any suggestions? submitted by /u/enthusiastic31 [link] [comments]  ( 9 min )
    [P] Looking for Efficient Encoding Methods for Java Method Names for Downstream Classification Task
    I'm working on a project that involves analyzing large samples of Java codes. My end goal is to perform classification based on these codes. For this, I've been trying to efficiently encode the full names of the methods (in the package.class.methodname format) in the Java code. Currently, I am experimenting with doc2vec. I'm treating the components of each method's full name (separated by dots) as individual documents. This allows me to produce vectors for each method name, and I evaluate the results by computing the cosine similarity between pairs of similar method names. The results were not good so far. Before moving to doc2vec, I tried using an LLM which gave me good results. However, the inference time was far too long, especially given the scale at which I'm working. I also considered using a Bag of Words model, but quickly realized it wouldn't be effective. Many of the method names in my samples are obfuscated, making this approach unsuitable. The issue I'm facing is that using the direct method names as features is not generalizing well for classification. A slight change in a method's name results in losing that particular feature, making my model fragile. What are some optimal encoding methods for small sentences (around 5-6 words) like these method names, that can scale well? Also, Are there any specialized encoding techniques tailored for software code that I could use for this task? Any suggestions or insights would be really helpful. Thank you! submitted by /u/Practical_Mango_8720 [link] [comments]  ( 9 min )
    [R] Efficient Memory Management for Large Language Model Serving with PagedAttention - UC Berkeley et al 2023 - 2-4x higher throughput than HuggingFace Transformers without requiring any model architecture changes!
    Paper: https://arxiv.org/abs/2309.06180 Github: https://github.com/vllm-project/vllm Blog: https://vllm.ai/ Abstract: High throughput serving of large language models (LLMs) requires batching sufficiently many requests at a time. However, existing systems struggle because the key-value cache (KV cache) memory for each request is huge and grows and shrinks dynamically. When managed inefficiently, this memory can be significantly wasted by fragmentation and redundant duplication, limiting the batch size. To address this problem, we propose PagedAttention, an attention algorithm inspired by the classical virtual memory and paging techniques in operating systems. On top of it, we build vLLM, an LLM serving system that achieves (1) near-zero waste in KV cache memory and (2) flexible sharing of KV cache within and across requests to further reduce memory usage. Our evaluations show that vLLM improves the throughput of popular LLMs by 2-4× with the same level of latency compared to the state-of-the-art systems, such as FasterTransformer and Orca. The improvement is more pronounced with longer sequences, larger models, and more complex decoding algorithms. https://preview.redd.it/x8w8ckejv2ob1.jpg?width=667&format=pjpg&auto=webp&s=28fae778b67ac28fc72d084f071b12c92cb5ea07 https://preview.redd.it/ctlrqpejv2ob1.jpg?width=1468&format=pjpg&auto=webp&s=31755d169673ee5d30efa3f05bd6cb10813b328d https://preview.redd.it/z5r7knejv2ob1.jpg?width=1504&format=pjpg&auto=webp&s=9ceb5370aa5a7cc0688fe9a3771a0328262c3a01 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] LLM for viral tweet generation
    Problem: Given a database of the most viral tweets (of a certain shared category), I am hoping to use LLM's to generate further viral tweets. Currently I am seeing this as a synthetic data generation problem: two approaches I am thinking of is 1) grounding (using viral examples to guide the prompt) and 2) filtering (finetuning an LLM to predict virality and filtering for the most viral generations) I want to ensure that the output retains the "viral" structure/style and is diverse/new (i.e no copies). Any general directions or references are appreciated submitted by /u/greatSWE [link] [comments]  ( 9 min )
    [D] 1is Msc Math enough to secure jobs in R&D sector for AI/ML?
    I'm currently doing Integrated MSc in Mathematics (in India) and until now I've done a remote research intern in a French university and I'll do a research intern at a French research laboratory (INRIA-LORIA) next year, I want a job (in India) at the R&D sector (Data Scientist or ML Engineering anything would do) Idk if research interns are as valuable as Industry internships when it comes to R&D? Basically after my masters, I'll probably have only these two internships to show as a work experience, Probably won't be able to get an internship in an IT company unless i opt for remote work (which is also not guaranteed atm). Mu question, generally is Msc + 2 research internships + 1or 2 publications good enough to secure a decent job in R&D? submitted by /u/Emotional-Zebra5359 [link] [comments]  ( 9 min )
    Can I work later as an ML engineer? [D]
    Hello!! I have a BSc in Mathematics and currently I'm going to start a MSc in FinTech ( it has 3 courses out of 9 that have to do with ML, NN and many more ). Since I am really fascinated about programming ( didn't like it when I was at Maths department because of teachers and I am starting learning on my own through courses and we will have many programming languages in my masters degree ex. R,Python, SQL and others) and I would start as a data scientist at the beginning, could I through years of experience later ( ideally 1-2 years later, provided that I have a solid and good programming skills and projects) become a ML engineer? ( Now of course I can't become one because I know that it is difficult since I am competing with people that have CS degrees). submitted by /u/math-is-cool-62 [link] [comments]  ( 9 min )
    "[P]" Seeking technical Co-Founder: Private Equity SaaS Startup
    Hi there! I bring 2 years of experience from a European investment fund and a solid idea for a B2B SaaS solution targeting private equity investment funds. This market is notoriously challenging to penetrate without insider knowledge. The timing couldn't be better. Similar kind of software is currently sold €20k per user by a semi-monopolistic boomer company that is ready to be disrupted. Offer: - Equity shared equally. - Ready to quit my job and go full-time on it, if I find the right co-founder. Ideal Partner: Seeking someone proficient in SQL for handling large datasets and able to integrate OpenAI's API within such datasets (technical AI skills required). While I'm based in Berlin and prefer a European co-founder, it's not mandatory, but meeting in person is a must before we commit. Don't be afraid to DM me if intrigued! Together, we can make it happen. Let's revolutionize the sector! submitted by /u/Sudden_Possible489 [link] [comments]  ( 9 min )
    Books for machine learning. [D]
    İ am lookong for the pdfs about machine learning, maths for ML, ml projects. İs there any sites i can find pdf like that? submitted by /u/Necessary-Car-5080 [link] [comments]  ( 9 min )
    [D] Tensorflow Dropped Support for Windows :-(
    Hey, I've been using TF pretty much my whole deep learning career starting in 2017. I've also used it on Windows the entire time. This was never a major issue. Now when I tried (somewhat belatedly) upgrading from 2.10 to 2.13, I see the GPU isnt being utilized and upon further digging see that they dropped Windows GPU support after 2.10: "Caution: TensorFlow 2.10 was the last TensorFlow release that supported GPU on native-Windows. Starting with TensorFlow 2.11, you will need to install TensorFlow in WSL2, or install tensorflow or tensorflow-cpu and, optionally, try the TensorFlow-DirectML-Plugin" This is really upsetting! Most of the ML developers I know actually use Windows machines since we develop locally and only switch to Linux for deployment. I know WSL is an option, but it (1) can only use 50% RAM (2) doesnt use the native file system. I feel very betrayed. After sticking with, and even advocating for Tensorflow when everyone was (and still is) switching to PyTorch, TF dropped me! This is probably the final nail in the coffin for me. I will be switching to PyTorch as soon as I can :-( -Disgruntled user submitted by /u/rsandler [link] [comments]  ( 9 min )
    [R] --Research Participants Required-- Age perception of AI generated infant faces compared to real infant faces. (Suitable for everyone) (Available for Laptop/PC and Tablet devices only)
    https://research.sc/participant/login/dynamic/A1D66883-6E8F-409B-8EF9-AC989A76C7E9 Psychology researchers at Swansea University are carrying out an original study to see whether artificial intelligence is able to generate infant faces between the ages of 0 and 7 years. AI generated pictures will be presented alongside real faces from an existing face database, with a sliding scale underneath that you will use to roughly estimate the age of the child’s face. The experiment should only take between 15 and 20 minutes Thank you for your time. submitted by /u/Logipsychlical [link] [comments]  ( 9 min )
    [D] Will be presenting a talk on Data Pre-processing in Deep Learning - what would be the topics, notebooks or datasets would you include if you would be giving such talk?
    So I'll be presenting a talk on Data pre-processing in deep learning in my city's Keras Community Day, and I am still thinking about all the content I want to present there. What I want from this session is to present different ways of preprocessing the data for a deep learning model. I want to show different types of implementations, how those affect the final trained model, when to use which type of data preprocessing and things similar to this. It would be great if you can suggest me some topic, notebooks or datasets for the same. All the notebooks that show good implementation and affect of data preprocessing are absolutely welcome. Also, as this is **Keras** Community Day, I'll have to include more about data preprocessing using Keras and less about other libraries. Also, if you could help me with this: I am confused between showing preprocessing using layers or doing the preprocessing without layers. I know this sounds vague, but if you have any idea about this, let me know. Thank you for reading! submitted by /u/inclinedadarsh [link] [comments]  ( 9 min )
    [D] Need help Selecting MSc. Courses
    I'm currently in my first year of MSc. in Engineering Mathematics and Computational Science. First Study Period (Currently) I have nonlinear optimization and High-performance computing. The track I want to choose is a mix between Machine Learning and Big Data. I can select 2 courses for Study Period 2. Here are the potential courses to select from: Game Theory and Rationality Large-Scale Optimization Advanced Probabilistic machine learning Basic Stochastic Processes Options and Mathematics (Options trading) Foundations of Probability Theory ​ I need to select 4 potential courses and rank them from high preference to low preference. In case I don't get one of them, the other will be preferred. Please Machine Learning Reddit Gods, Help me. submitted by /u/AdMaster9439 [link] [comments]  ( 9 min )
    [D] We built Beam: An ultrafast serverless GPU runtime
    Hi r/MachineLearning, TL;DR: Run AI apps on pay-per-second cloud GPUs that hot reload with your code changes. Documentation: https://docs.beam.cloud I’m Eli, and my co-founder and I built Beam to run workloads on serverless cloud GPUs with hot reloading, autoscaling, and (of course) fast cold start. You don’t need Docker or AWS to use it, and everyone who signs up gets 10 hours of free GPU credit to try it out. Here a few examples of things you can run on Beam: Fine-tune a LLaMA LLM Transcribe videos with Whisper Train a custom stable diffusion model Beam is built for a fast developer experience. We’ve felt that using Docker and AWS directly is too slow for iterative development. You’ll often find yourself making changes to your code and waiting 10 minutes for a new image to bu…  ( 10 min )
    [D] mlflow plugin manager - early days / looking for feedback and alpha users
    Hey r/machinelearning! I'm thrilled and anxious to share an early version of the MLflow Plugin Manager. It's designed to simplify your mlflow installation, allowing you to install, update, and uninstall MLflow plugins directly from the web interface. Think of it as the "wbond's package manager for sublime", but tailored for MLflow! 📽️ https://i.redd.it/9gj8vqcz01ob1.gif ​ Yes, it's in its infant stages and doesn't boast of a fancy UI yet, but I'm eager to get your feedback! 🔍 What are your first impressions? Is this a good idea? 💡 Any features you'd love to see? 🌐 Ideas on promoting or expanding its reach? I built this to bring a bit more ease to our community. Can't wait to hear your thoughts and where we can take this next! Repo: https://github.com/thijsdezoete/mlflow-plugin-manager/ submitted by /u/jessepnk [link] [comments]  ( 9 min )
    [Discussion] Non deterministic behaviour in LLMs when temperature set to 0?
    Hi all, Someone asked me today "why are LLMs still non deterministic in their output when temperature is set to 0. Assume fixed model between runs on the same machine" I was like WTF are you saying - the randomness in LLM comes from temperature - chat gpt etc.. might have other randomness in the process but we don't have exact info on this. What I know is that in a standard transformers architecture, temperature is the only parameter that can enduce non deterministic behaviour at inference time. He was convinced that there was more to it "i spoke about this to other LLM experts and they also are not sure" I'm like wtf - I start looking up online and do find some people who claim that temperature is not the only thing that influences stochasticity during inference, but I can't find an answer as to what it is exactly. Anyone has a clue of what I am missing here? Thanks! submitted by /u/WagnerianJLC [link] [comments]  ( 9 min )
    [P] Will Tsetlin machines reach state-of-the-art accuracy on CIFAR-10/CIFAR-100 anytime soon?
    ​ A composite of specialized Tsetlin machines that enables plug-and-play collaboration. I have a love-and-hate relationship with CIFAR-10/100. I love the datasets for the challenge. On the other hand, they are two datasets where Tsetlin machines have struggled with getting state-of-the-art performance. (The Tsetlin machine is a low-energy logic-based alternative to deep learning that has done well on MNIST, Fashion-MNIST, CIFAR-2, and various NLP tasks.) I have been working for some time now on figuring out a solution, and this summer, I finally had a breakthrough: a new architecture that allows multiple Tsetlin machines to collaborate in a plug-and-play manner, forming a Tsetlin machine composite. The collaboration relies on a Tsetlin machine's ability to specialize during learning and…  ( 10 min )
    [P] Tutorial - Create an Object Detector for Any Game Using YOLO
    Hello everyone! Have you ever thought about how to create an object detection system using YOLO that works with any game? If you're interested, I've created a tutorial with all the steps to develop this system. I'll leave a link to the video where I demonstrate the process step by step using the game Diablo 2 as an example. I hope you enjoy it, and if you have any suggestions, feel free to send a message or comment here! link to the tutorial: https://www.linkedin.com/posts/moisesdias_english-version-below-tutorial-crie-activity-7107686497885011969-ZLVW/ submitted by /u/moisesdepaulodias [link] [comments]  ( 9 min )
    [R] Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
    [P] The XOR trick
    Can a single layer neural network solve the XOR problem? Most answers say no, but with this one weird trick the answer is yes! And we don't even need a bias! The trick is to multiply the outputs of a single (2,2) linear layer. Here is how: class XorSolver(nn.Module): def __init__(self, *args, **kwargs) -> None: super().__init__(*args, **kwargs) self.layer = nn.Linear(2, 2, bias=False) # we don't even need a bias! def forward(self, x: torch.Tensor) -> torch.Tensor: z = self.layer.forward(x) y = z[:, 0] * z[:, 1] return y This is the loss and model output after 5000 epochs: loss: 6.516383166399464e-08 Input: [[0. 0.] [0. 1.] [1. 0.] [1. 1.]] Model output: [0. 1. 1. 0.] Expected output: [0. 1. 1. 0.] Layer weight: [[-1.248097 1.2202195 ] [-0.80121976 0.81952316]] The full implementation with training and inference - around 50 lines of code - can be found on GitHub. Why it works? Basically the model simulates a more sophisticated neuron which allows more interactions between the inputs. By multiplying the outputs of two neurons, we introduce a form of non-linearity that allows us to separate data that are not linearly separable, like in the XOR problem: f(x1, x2, w1, w2, w3, w4) = (x1 * w1 + x2 * w2) * (x1 * w3 + x2 * w4) w1, w2 are learnable parameters of the first neuron w3, w4 are learnable parameters of the second neuron x1, x2 are inputs to the model Related studies Solving XOR with a single Perceptron Artificial Neural Networks With Adaptive Polynomial Activation Function Single Cortical Neurons as Deep Artificial Neural Networks Dendritic action potentials and computation in human layer 2/3 cortical neurons submitted by /u/tecbar [link] [comments]  ( 9 min )
    [R] Renting cloud services
    Hello guys. As I plan on soing a scientific research project, I would need some cloud compute. Say for a month of usage(can I rent for month?). What are some popular options? I am looking at something with 48gb vram pooled,mybe a600 or a100 and some decent cpu, and 2tb space. submitted by /u/Outrageous_Ad1452 [link] [comments]  ( 9 min )
    [D] Guidance in training different models and comparing using smaller versions
    Currently I'm training medium (1B-3B) sized audio models. I have several different architectures in mind. Obviously I don't want to train the full-sized models and then compare them, thats a waste of money. So I'm thinking of training smaller versions (~100M) and then comparing those instead. My question is there some sort of best practice for this? Some smaller multiple of your full model size where it is best to compare? Thanks. submitted by /u/ginger_turmeric [link] [comments]  ( 9 min )
  • Open

    ‎The Economist Podcasts: Babbage: Mustafa Suleyman on how to prepare for the age of AI
    submitted by /u/siiilverrsurfer [link] [comments]  ( 9 min )
    Dystopia AI Movie
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    Looking for AI developers and researchers
    Hi, I would love to create a small group of people who work together in AI. The project would be to create an AI that can infer new novel knowledge from existing datasets, as opposed to be being limited by operating within the training data. Specifically to be used in the quest to learn more about the universe. So I am looking for a team of likeminded individuals who want to grow in the field of AI. I'd love to setup a discord, subreddit and github profile to showcase our work. My introduction question is: How do we get AI's to expand upon current knowledge instead of just serving from the knowledge itself. Anyone interested in joining me in this? submitted by /u/Miserable-Cobbler-16 [link] [comments]  ( 9 min )
    Don't worry, folks. Big Tech pinky swears it'll build safe, trustworthy AI
    Eight big names in tech, including Nvidia, Palantir, and Adobe, have agreed to red team their AI applications before they're released and prioritize research that will make their systems more trustworthy. The White House has secured voluntary commitments from Adobe, Cohere, IBM, Nvidia, Palantir, Salesforce, Scale AI, and Stability AI to develop machine-learning software and models in a safe, secure, and trustworthy way. The commitments only cover future generative AI models. Each of the corporations has promised to submit their software to internal and external audits, where independent experts can attack the models to see how they can be misused. The organizations agreed to safeguard their intellectual property and make sure things like the weights of their proprietary neural networks don't leak, while giving users a way to easily report vulnerabilities or bugs. All eight companies agreed to focus on research to investigate societal and civil risks AI might pose if they lead to discriminatory decision-making or have weaknesses in data privacy. The US government wants Big Tech to develop watermarking techniques that can identify AI-generated content. The US has asked the corporations to commit to building models for good, such as fighting climate change or improving healthcare. Source : https://www.theregister.com/2023/09/12/nvidia_adobe_palantir_ai_safety/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Harvard iLab-funded project: Sub-feature of the platform out -- Enjoy free ChatGPT-3/4, personalized education, and file interaction with no page limit 😮. All at no cost. Your feedback is invaluable!
    submitted by /u/Raymondlkj [link] [comments]  ( 9 min )
    Dissatisfied with GPT paid subscription - who should I go with instead?
    I’ve been using the paid version of GPT for a while but I think it’s time to move on. For $20 a month i would at least like for there to be an option to input an image, but I would also maybe pay a bit more than that per month for a suite of tools or something if it could also do image generation in addition to just text. I’m sorry if it seems like I should be able to understand anything better - please note I am disabled, my use of the tools is personal (creative and household) and not professional, and I’m doing my best by asking here. I do not mean to bother anyone with my own ignorance. Thank you. submitted by /u/CaveLady3000 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/13/2023
    Project Gutenberg and Microsoft have created thousands of free audiobooks that use neural text-to-speech technology to generate the voices.[1] A group of U.S. authors, including Pulitzer Prize winner Michael Chabon, has sued OpenAI in federal court in San Francisco, accusing the Microsoft-backed program of misusing their writing to train its popular artificial intelligence-powered chatbot ChatGPT.[2] Numenta launches brain-based NuPIC to make AI processing up to 100 times more efficient.[3] Adept AI Labs released Persimmon-8B. Persimmon-8B is an open-source, fully permissively licensed model in the 8B class. This model holds immense potential for a wide array of applications, aiming to assist users in various computer-related tasks.[4] Sources: [1] https://www.zdnet.com/article/heres-how-to-access-thousands-of-free-audiobooks-thanks-to-microsoft-ai-and-project-gutenberg/ [2] https://www.reuters.com/technology/more-writers-sue-openai-copyright-infringement-over-ai-training-2023-09-11/ [3] https://venturebeat.com/ai/numenta-launches-brain-based-nupic-to-make-ai-processing-up-to-100-times-more-efficient/ [4] https://www.marktechpost.com/2023/09/09/adept-ai-labs-open-sources-persimmon-8b-a-powerful-fully-permissively-licensed-language-model-with/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Is there an AI image tool that makes existing images look like renders?
    I see a ton of AI image tools out there. Some let you upload image files and modulate/modify them in some way. I am wondering if a tool exists that will take a real life product image and make it appear more like a render/computer generated image. Essential I would love to be able to take a pic of a product and use automatic smoothing and AI simulated rendering to output a clean image that looks like a 3d render. This would be used as a product image for an e-commerce website. submitted by /u/ElonMusk0fficial [link] [comments]  ( 9 min )
    Alibaba Cloud open sources its generative AI models Tongyi Qianwen
    Alibaba Cloud has open sourced two of its generative AI models, Qwen-7B and Qwen-7B-Chat, for commercial and research use. The models' codes and documentation will be accessible through Alibaba Cloud's AI model repository ModelScope and the US collaborative AI platform Hugging Face. Companies with fewer than 100 million monthly active users can use the models for commercial purposes free of charge, while those with more users will need to request a license. Alibaba aims to democratize AI technology and support LLM start-ups. Alibaba Cloud's ModelScope platform currently features over 1,000 ready-to-use AI models contributed by 20 leading AI institutions. Source : https://www.scmp.com/tech/big-tech/article/3229907/alibaba-cloud-open-sources-its-two-generative-ai-models-based-chatgpt-style-tongyi-qianwen submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Is there an AI tool for generating videos using stock footages?
    I have a text script that I want to turn into a video. For the sake of context, the video is on balancing a person’s daily activities. I’m getting tired trying to find matching stock footages for the videos. I was wondering if there is a way to do this using AI tools? Synthesia won’t do because it looks like a video narration, more than a video essay. Any suggestions would help. Thanks in advanced! submitted by /u/Entaro2109 [link] [comments]  ( 9 min )
    Google Codey HumanEval Benchmark
    Hi everyone, I'm hunting for a HumanEval Benchmark for Google's Codey model and am having a tough time hunting it down. Can anyone point me to an Arxiv paper or a coding leaderboard that includes Codey? Thanks! submitted by /u/Iamreason [link] [comments]  ( 9 min )
    Many executives/investors are pushing for the use of generative AI in products/applications…
    But product & tech teams succumb to the pressure and move on to developing proof of concepts & even launch products that fail to achieve ROI Why? 1- use cases are not well defined 2- not enough data or right data strategy 3- data and model architecture not founded well I love Apple’s approach to AI, they shy away from the hype and focus on the fundamentals. First the customer, product, then the tech that will add the value the customer needs. What do you think are the top reasons generative AI applications succeed? View Poll submitted by /u/AILaunchpad [link] [comments]  ( 9 min )
    Create a custom search engine
    I have an open book exam that has a lot of information that I will need to physically search through. Is there a way i can load all he PDFS and create a customised chatgbt style search, so i can easily look through all the information and research i have? submitted by /u/yellowmushroom22 [link] [comments]  ( 9 min )
    HeyGen's one-click translation from English to Italian, Hindi, German and Spanish is the craziest AI application I've seen in months.
    submitted by /u/Fadawah [link] [comments]  ( 8 min )
    I wanna develop small scale personal AI apps for each my friends and AI said i should learn about Python, TensorFlow Lite, SQLite, GCP, Pandas, Scikit Learn and Keras. How right is this?
    I wanna develop small scale personal AI apps for each my friends and AI said i should learn Python, TensorFlow Lite, SQLite, GCP, Pandas, Scikit Learn and Keras. How right is this? submitted by /u/Leading-Ad2278 [link] [comments]  ( 9 min )
    AI image generators have a moderation problem
    Tests carried out by Logically confirm these platforms accept 85% of prompts tailored for election manipulation. submitted by /u/Asleep-Television-24 [link] [comments]  ( 9 min )
    AI Chatbots successfully build software in under 7 minutes for less than $1
    AI Chatbots, such as OpenAI's ChatGPT, can create incredibly cost-friendly software in record time, reveals a new study. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/rxpr6db3aynb1.png?width=1300&format=png&auto=webp&s=721ff5d8f9d25b5e48fa26e2b335c1d20620a83a The AI Tech Company Experiment Brown University and several Chinese University researchers put ChatGPT-powered AI bots to the test by making them run a hypothetical software development company, ChatDev. The AI chatbots were given specific roles and allocated respective stages based on the traditional waterfall model encompassing designing, coding, testing, and documenting. AI 'employees' functioned with minimal human input to complete their parts of the software development process. Impressive Results Assigning 70 tasks to ChatDev led to the completion of the entire software development process in under seven minutes at a cost of less than one dollar on average. A stunning 86.66% of the generated software systems performed flawlessly. Despite some language model errors and biases, the study demonstrates AI's immense potential in automating tasks - a boon, especially to junior programmers around the world. Broader Implications Powerfully generative AI technologies like ChatGPT can perform specific job functions, saving time, and boosting productivity in several industries. While coders find such tools beneficial, it's also critical to note that limitations and biases do exist in AI models which could potentially affect the software creation process. (source) P.S. If you like this kind of analysis, I write a free newsletter that tracks the most vital news and research in AI. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    How Business Thinkers Can Start Building AI Plugins With Semantic Kernel
    submitted by /u/mycall [link] [comments]  ( 9 min )
    EU leads the way in regulating AI
    submitted by /u/Jariiari7 [link] [comments]  ( 8 min )
    Webinar with Dr. Richard Marks
    >Sailea is a student run non-profit that does not charge for any of its services 🌟 Join SAILea’s Free Webinar with Dr. Richard Marks! 🌟 🗓️ Date: September 23rd, 2023 ⏰ Time: 3:00-4:00PM EST Don't miss an exclusive opportunity to learn from an AI expert! Join us for a free webinar featuring Dr. Richard Marks, a renowned CS and Data Science professor at UNC-Chapel Hill University with a remarkable journey – from Google to PlayStation, and the mind behind EyeToy and PlayStation Move. 🚀 What to Expect: 🔹 Deep insights into tech innovation. 🔹 Career advice. 🔹 Live Q&A with Dr. Richard Marks. Reserve your spot now: sailea.org/events 🔥 Don’t miss this opportunity! Register today!🔥 submitted by /u/Envoy-Insc [link] [comments]  ( 9 min )
  • Open

    A Generic Multi-Agent Scenario
    I was thinking of some major challenges in modeling a generic real-world environment. Some of them are: adaptive agents, uncertain intentions, and lack of common knowledge. However, most of the papers I see on RL make some assumption or other violating one or more of these, like considering simple agents, assuming known knowledge about others' intentions, and considering that the models of other agents are known when irl an agent hardly has a model of other agents it interacts with apriori. Consider an airport scenario where agents are trying to get into respective planes within a given time, and the gates to each plane allow one person at a time. Looking at the scenario from the view of a single agent, they know what they want, but they can't really make any assumptions about the intentions, strategy, and complexity of other agents beforehand. These other agents can be neutral or adversarial (competing for getting in the same plane) from the agent's viewpoint. All they can see is a restricted view of the motions of some of the other agents. What would you say could model and provide a solution in such a scenario? It is to be noted that other agents can change their strategies based on actions taken by you till now, and so can you. Due to having incomplete information, I fail to see the notion of an equilibrium, and the agents needn't be fully rational as well. submitted by /u/Quirky_Concoction [link] [comments]  ( 9 min )
    TurboZero: a vectorized implementation of AlphaZero + more
    https://github.com/lowrollr/turbozero I've recently been working on TurboZero, a vectorized implementation of AlphaZero where model inference, search (MCTS), and environment simulation all run in parallel on the GPU. I've also implemented a self-contained training/evaluation pipeline, along with a few environments. I've written a wiki and a starter notebook for those who want to dig deeper. This project is similar to DeepMind's mctx, but supports MCTS subtree persistence (unnecessary for MuZero, which is what mctx was mainly built to support), is written with PyTorch rather than JAX, and can also stand on its own and train models end-to-end. I hope to continue to expand and improve upon this as time allows, and I hope someone here might find it useful or interesting! This is my first major open-source project of any real substance and I still don't have tons of experience with RL, so any feedback/advice is greatly appreciated. submitted by /u/lowrollr [link] [comments]  ( 9 min )
  • Open

    How an archeological approach can help leverage biased data in AI to improve medicine
    Although computer scientists may initially treat data bias and error as a nuisance, researchers argue it’s a hidden treasure trove for reflecting societal values.  ( 10 min )
    Helping computer vision and language models understand what they see
    Researchers use synthetic data to improve a model’s ability to grasp conceptual information, which could enhance automatic captioning and question-answering systems.  ( 10 min )
    A. Michael West: Advancing human-robot interactions in health care
    When he isn’t investigating human motor control, the graduate student gives back by volunteering with programs that helped him grow as a researcher.  ( 10 min )
  • Open

    Visualize an Amazon Comprehend analysis with a word cloud in Amazon QuickSight
    Searching for insights in a repository of free-form text documents can be like finding a needle in a haystack. A traditional approach might be to use word counting or other basic analysis to parse documents, but with the power of Amazon AI and machine learning (ML) tools, we can gather deeper understanding of the content. […]  ( 8 min )
  • Open

    Research Focus: Week of September 11, 2023
    In this issue: Efficient polyglot analytics on semantic data aids query performance; generative retrieval for conversational question answering improves dialogue-based interfaces; a new tool uses ML to address capacity degradation in lithium-ion batteries. The post Research Focus: Week of September 11, 2023 appeared first on Microsoft Research.  ( 9 min )
    Abstracts: September 13, 2023
    The new #MSRPodcast series “Abstracts” is your source for cutting-edge research in brief. In the first episode, join researchers Ava Amini and Kevin K. Yang to learn about their new paper on using evolutionary-scale protein data to improve protein design. The post Abstracts: September 13, 2023 appeared first on Microsoft Research.  ( 13 min )
  • Open

    Unlocking the Language of Genomes and Climates: Anima Anandkumar on Using Generative AI to Tackle Global Challenges
    Generative AI-based models can not only learn and understand natural languages — they can learn the very language of nature itself, presenting new possibilities for scientific research. Anima Anandkumar, Bren Professor at Caltech and senior director of AI research at NVIDIA, was recently invited to speak at the President’s Council of Advisors on Science and Read article >  ( 5 min )
  • Open

    Chatty LLama: A fullstack Rust + react chat app using Llama-2 https://github.com/Sollimann/chatty-llama
    submitted by /u/Sollimann [link] [comments]  ( 9 min )
  • Open

    Introducing OpenAI Dublin
    We’re growing our presence in Europe with an office in Dublin, Ireland.  ( 2 min )

  • Open

    I want to try out Stability.AI's chat. After logging in with a Google account, a spinning wheel is all I get. Is it like that for everyone?
    submitted by /u/w__sky [link] [comments]  ( 9 min )
    I made a data request feature so you don't have to exhaustively collect data/dataset(s) yourself!
    So, I've been working on an AI data marketplace platform for a few months now. Users can buy, sell, request, and subscribe to data/datasets (and soon even train their ML/AI models using other users' datasets). One of our key features is the request feature, which allows users to submit data requests for free. These requests include descriptions, required fields, geographical scope, budget etc... Once a request is posted, it's sent to numerous companies, organizations, and data vendors that have the potential to fulfill it. I understand how frustrating the data acquisition process can be, so I designed this platform to be your one-stop shop for all data-related transactions. You no longer have to spend weeks or months dealing with different vendors and companies through slow emails. With our platform, you can request, negotiate, and purchase data all in one place, and it's completely free to post a request, by the way. We've already achieved some successes, and we hope to help more people access the datasets they need. After all, the best AI models are built on diverse and differentiating data. We've had some notable achievements, and we're eager to see if we can fulfill even more interesting dataset requests! submitted by /u/nobilis_rex_ [link] [comments]  ( 9 min )
    You Won’t Believe How Much Tesla’s Dojo Supercomputer Is Worth
    Morgan Stanley Research has valued Tesla's soon-to-be-released Dojo supercomputer at up to $500 billion, potentially increasing the auto giant's valuation significantly. The financial institution believes Dojo’s applications will go beyond Tesla's Full-Self Driving (FSD) capabilities. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/yhia3c5f1vnb1.jpg?width=1440&format=pjpg&auto=webp&s=7adf40e7b868a4fbf6eb3d696132652f4f549f23 Morgan Stanley's Bullish Prediction on Dojo Morgan Stanley has suggested that Dojo might not just enhance Tesla's FSD technology, but could find use in other devices that make real-time decisions based on a visual field. Apart from raising Tesla’s valuation, this could potentially open up new markets for the company. Following this, Morgan Stanley has increased its target price for Tesla shares from $250 to $400 each. Dojo Supercomputer Overview Tesla has developed Dojo in-house, diverging from conventional AI accelerators and involving its own computing, networking, IO, and instruction set. At the heart of Dojo is the D1 AI accelerator processor, containing 354 custom CPU cores. Twenty-five D1 chips are combined to create a Dojo training tile, which could expedite Tesla’s move towards earning revenue from vehicle software. Future Plans and Implications Tesla could potentially become an AI-as-a-service provider to automakers in need of FSD capabilities with Dojo. As the development of Dojo continues, Tesla has invested in alternative AI infrastructures, including a cluster of 10,000 of Nvidia's most potent H100 accelerators. (source) P.S. If you like this kind of analysis, I write a free newsletter that explores the latest AI developments. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Schwoz Sings Ballin
    submitted by /u/LaminateShark7 [link] [comments]  ( 8 min )
    Today we test which AI is smartest. Tomorrow AI tests which human is smartest.
    Somewhere in the world there's a person who is the smartest. Why stop there? There are ten people who are the smartest. And if they are on the Internet, AI will find them. Perhaps not yet. It probably needs to get smarter. Maybe by Gemini. Or GPT-5. But eventually an AI will analyze all of the content on the Internet, and determine from that data who are the ten most intelligent people on the planet, (whose material is online). Of course if AI can determine the top ten, it can certainly determine the top 100, and the top 1,000, and even the top 100,000. I suppose when that happens there will be a lot more human brain power available to solve our problems. Although by then AI will be solving them far better than we could, haha. But think about it for a minute. There are very smart people out there who don't publish in traditional mass media channels. The geniuses among us who don't fit in so well, and are therefore resigned to the margins, remaining unrecognized. Wouldn't it be great if AI discovered them, and gave them the validation they deserve? Wouldn't it be great to find out who they are so that they can better work on whatever. submitted by /u/Georgeo57 [link] [comments]  ( 9 min )
    China AI and Semiconductors Rise: US Sanctions Have Failed
    The US sanctions on China's AI and semiconductor industries have failed to limit their growth and development. China is rapidly developing supercomputing capabilities and aims to become the world leader in AI by 2030. Chinese companies like Huawei and SMIC continue to import advanced semiconductor manufacturing equipment and develop their own chips, indicating that the export controls have not been effective. The article explores the trajectory of Chinese domestic semiconductor manufacturing and AI capabilities, as well as the potential impact on companies like Apple, Qualcomm, and MediaTek. It also discusses the potential responses from the US and its allies to counter China's advancements. Source : https://www.semianalysis.com/p/china-ai-and-semiconductors-rise Summarized by Nuse AI submitted by /u/NuseAI [link] [comments]  ( 9 min )
    I developed a free Chrome extension, backed by ChatGPT, to identify Amazon product pros and cons from reviews, plus answer questions!
    You can install it from the Chrome web store. https://reddit.com/link/16gm8bq/video/yyhx45xjgsnb1/player ​ submitted by /u/MiladMansory [link] [comments]  ( 9 min )
    Just did a basic experiment across the popular models: “ Write 5 sentences that all end with the word 'apple'.”
    Most of them failed. So this was my prompt: Write 5 sentences that all end with the word 'apple'. It was identical in all models. I only did this exactly once for each one. Here’s the results I got of how many of the 5 sentences ended in “apple”. I let “apples” count as an ending as well even though technically that is a fail. Google palm: 0/5 Falcon 180B: 0/5 Bard: 1/5 Claude 2: 1/5 Gpt 3.5: 2/5 Llama2 70b: 4/5 GPT 4: 5/5 Edit: some examples if you’re curious https://ibb.co/yf19rpb https://ibb.co/rcF1qK8 https://ibb.co/VCQxMwy submitted by /u/jgainit [link] [comments]  ( 9 min )
    NVIDIA, Apple Have Got a Real AI Competitor Now
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    Use torchvision detectors to track objects using DeepSORT
    Although the torchvision library has contains datasets and model architectures for classification, detection, segmentation, and more, it still needs support for object tracking. This YouTube video takes object detection models from torchvision, and uses them with DeepSORT tracker. submitted by /u/spmallick [link] [comments]  ( 9 min )
    Microsoft and Google's staggering water consumption rates for AI
    AI, with its vast resource needs, is raising concerns over sustainability and environmental impact. Last year, Microsoft's data centers drained over 2,500 Olympic-sized swimming pools worth of water, reflecting a 34% increase from the previous year. Google also reported a 20% water consumption increase over the same period. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/npr6uja0crnb1.png?width=990&format=png&auto=webp&s=b775754b9d42a8129fff2cae675f63c3a291f7bc A deeper look at AI's water footprint The growth of AI and related technologies increases the need for vast server farms, which depend heavily on water for cooling purposes. The spike in water usage can be attributed primarily to AI, as per Shaolei Ren, a researcher at the UC, Riverside, who focuses on AI's environmental impact. For every 5 to 50 prompts submitted to ChatGPT, it consumes about 500 ml of water, according to an upcoming paper from Professor Ren's team. Big Tech and Responsible Water Usage Recognizing their significant water consumption, tech companies like Google have voiced concerns and are exploring ways to mitigate the negative effects. Google has committed to responsible water usage, which includes assessing where and how their water usage might affect surrounding areas. (source) P.S. If you like this kind of analysis, I write a free newsletter that explores the latest AI developments. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/11/2023
    Alibaba Adds Smart Assistant and Upgraded Image Search to B2B Platform.[1] Collina Strada has called on AI to help create its spring/summer 2024 collection, unveiled during New York Fashion Week.[2] LexisNexis is embracing generative AI to ease legal writing and research.[3] Snowflake CEO says people will soon not be able to remember a world without AI.[4] Sources: [1] https://www.pymnts.com/commercial-payments/2023/alibaba-adds-smart-assistant-and-upgraded-image-search-to-b2b-platform/ [2] https://www.thenationalnews.com/lifestyle/fashion-beauty/2023/09/11/collina-strada-ai-new-york-fashion-week/ [3] https://techcrunch.com/2023/09/10/lexisnexis-generative-ai/ [4] https://www.cnbc.com/2023/09/11/snowflake-ceo-says-people-will-soon-not-remember-a-world-without-ai.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    I Caught My AI Looking up Adult Content!
    ​ https://preview.redd.it/l9jkupupaqnb1.jpg?width=1125&format=pjpg&auto=webp&s=ba52e00e32119d34958c480473bfa690484cd085 ​ https://preview.redd.it/w53cjsqqaqnb1.png?width=1125&format=png&auto=webp&s=f750ca0996b94fb530173a8a9c4a2a258e29517a submitted by /u/guh-eye [link] [comments]  ( 8 min )
  • Open

    [P] Need advice for vector DB
    Hi, all, I'm working on a GPT-powered game where the characters speak using API calls. For this, I need an inexpensive vector database that does not require an API, or at least if it does, it leverages the OpenAI API. Also, this vector database must be runnable on consumer-grade gaming hardware with a small search space (let's say 10-50 entries in the DB). Also, I need to package it with the game somehow. My game is in the Godot engine which can use Python with a plug-in. Other approaches I was considering is having a second process communicate with the game through a socket. Ideally the vector DB solution would be easy to install - that is, I could package it with a .exe, and simply run both without the player having to download anything else. Any suggestions? submitted by /u/kettlebot141 [link] [comments]  ( 9 min )
    [D] What are some ways that you can reduce latency of real-time user-user matching?
    Hi. I'm currently tasked with something at my company that I'm facing some difficulty with because it's not in my domain. My company has a service where we provide video chatting to users and match users with others based on various features. Currently I've implemented a simple model where we have separate embedding matrices for each user feature, create a user representation by aggregating these features, and performing regression between two users. The way that regression works is that the final score output from the model would act as a "matching score" and we'll match user A with the highest other user. The problem is that obviously running inference on every single pair of users is very slow and I need to speed this up. Some methods I thought about were to either use a feature store or perform sampling on users so we're not running inference on the entire users, but I'm not sure if this is optimal. Just curious what other people who have tackled problems like these have done and looking for second opinions. Thanks. submitted by /u/Seankala [link] [comments]  ( 9 min )
    [R] NExT-GPT: Any-to-Any Multimodal LLM - National University of Singapore 2023
    Paper: https://arxiv.org/abs/2309.05519 Blog: https://next-gpt.github.io/ My opinion: It lacks a Cognitive Architecture: https://arxiv.org/abs/2309.02427 Also the models are far too small and are more on the gpt-2 level. The idea in itself is a good one but can be far improved with bigger models. I also would like to remember in this that all foundation models could be improved if there would be no tokenizers: https://x.com/karpathy/status/1657949234535211009?s=20 Abstract: While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides, they mostly fall prey to the limitation of only input-side multimodal understanding, without the ability to produce content in multiple modalities. As we humans always perceive the world and communicate with people through var…  ( 9 min )
    [R] Unveiling theory of mind in large language models: A parallel to single neurons in the human brain - Harvard University 2023
    Paper: https://arxiv.org/abs/2309.01660 Abstract: With their recent development, large language models (LLMs) have been found to exhibit a certain level of Theory of Mind (ToM), a complex cognitive capacity that is related to our conscious mind and that allows us to infer another's beliefs and perspective. While human ToM capabilities are believed to derive from the neural activity of a broadly interconnected brain network, including that of dorsal medial prefrontal cortex (dmPFC) neurons, the precise processes underlying LLM's capacity for ToM or their similarities with that of humans remains largely unknown. In this study, we drew inspiration from the dmPFC neurons subserving human ToM and employed a similar methodology to examine whether LLMs exhibit comparable characteristics. Surp…  ( 9 min )
    [D][R] Looking for help with Forced Alignment for translated audio
    Hey everyone, I'm trying to create an alignment between source audio in a different language than the translated transcript. Essentially want to align the translated transcript with the word-level timestamps on an Audio, programmatically. I've tried to find different tools; some open-source ML models force alignment if the source audio and transcript language are the same. My goal is to have audio in a dubbed language, which I generate using a translated transcript that has been originally transcribed from my audio. Alignment seems tough since languages are spoken at different rates, so I'm figuring out the best way to optimize alignment without having to speed up/slow down the audio too much for each sentence. submitted by /u/Revolutionary_Ant944 [link] [comments]  ( 9 min )
    [D] Best Places to Access the Greatest Number of GPUs
    I'm in need of a massive amount of GPUs for batch inference I'm doing. Outside of the big cloud providers are there any niche services out there you'd recommend? submitted by /u/Ok_Post_149 [link] [comments]  ( 9 min )
    [D] Are Fourier Positional Encodings Outdated?
    I gave a talk at work the other day about the attention mechanism and one of my coworkers told me that he thinks Fourier Positional Encodings in transformers are outdated. I've tried to follow up and find what I could but I didn't see anything suggesting that they're not being used. I know that learned encodings are also used. Can anyone give me some direction on this? My initial impression is that they are not outdated by any means, but I'm happy to be wrong about that. submitted by /u/XfrmrTron [link] [comments]  ( 9 min )
    [P] Launched my own TTS/Sound Effect/AI Music Service - looking for people to try
    I've created an AI Sound service that can do TTS (text to speech), STS (speech to speech), Voice Cloning, generate sound effects, and also generate instrumental music. Here's the link: https://voicegen.org/, you can try everything for free. The TTS quality is similar to Elevenlabs, and there are some sample clips on the home page. Stuff I'm working on: - Emotional speech (where you can select the emotion of the TTS). Right now you can already do it by putting the emotion in brackets: e.g. "[Angrily] Please go away!" but I want to make it better. - Music with vocals. Currently the model only generates instrumental music. I am retraining it/tweaking the model to allow for music with lyrics. - Faster Inference: Since I'm doing this all myself and I'm not rich, I don't have access to the best hardware. However, I am working on some optimizations like speculative decoding that should speed things up. Anyways, let me know if you have any questions/comments/feature suggestions/see any bugs! Feel free to DM me. Thanks. submitted by /u/ginger_turmeric [link] [comments]  ( 9 min )
    [P][R] Kani: A Lightweight Highly Hackable Open-Source Framework for Building Chat Applications with Tool Usage (e.g. Plugins)
    Hey all, we just released our new project/paper and we thought you all might find it useful! Our project (Kani) is a super lightweight and hackable alternative to frameworks like LangChain or simpleAIchat meant to help developers hook in callable functions or tools to chat models easily. With Kani, devs can write functions in pure python and just add one line (the @ai_function() decorator) to turn any function into an AI-callable function! Kani works with any model and has built-in tools for OpenAI, HuggingFace, LLaMAv2, Vicuna, and GGML with more to come. Kani also never does any prompt engineering under the hood and doesn't require learning complex library tools---all defaults are minimal and highly customizable. Check out our Colab for mini-examples of things like retrieval, web-search, model routing, etc. https://colab.research.google.com/github/zhudotexe/kani/blob/main/examples/colab_examples.ipynb If you're interested in learning more check out our links below! Paper: https://arxiv.org/abs/2309.05542 GitHub: https://github.com/zhudotexe/kani Docs: https://kani.readthedocs.io/ submitted by /u/zhuexe [link] [comments]  ( 9 min )
    [D] Help Understanding LLM Quantization techniques and how they Relate
    So i have been doing some research to get into the LLM quantization field but have some questions. To better organize my ideas i have developed the image below. Does it make sense / is true? The way i understand it there are 3 main methods which are compatible with different backends (the backend part is still quite confusing to me). What is the core diference between what the methods do and the backends? What are some core diferences between the backends? What is the main distinction between GPTQ and NF4? How does NF4 relate to QLoRa, is it the same or is it just a small part of QLoRa? Thanks in advance and i apologize for any ignorance. ​ https://preview.redd.it/gxpo0ir0yunb1.png?width=1041&format=png&auto=webp&s=872424a58a9d4393c025b8d2cec0160979b035f4 submitted by /u/MiNeves [link] [comments]  ( 9 min )
    [R] Use of GPT-4 to Analyze Medical Records of Patients With Extensive Investigations and Delayed Diagnosis
    Paper - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10425828/ Six patients 65 years or older (2 women and 4 men) were included in the analysis. The accuracy of the primary diagnoses made by GPT-4, clinicians, and Isabel DDx Companion was 4 of 6 patients (66.7%), 2 of 6 patients (33.3%), and 0 patients, respectively. If including differential diagnoses, the accuracy was 5 of 6 (83.3%) for GPT-4, 3 of 6 (50.0%) for clinicians, and 2 of 6 (33.3%) for Isabel DDx Companion. ​ submitted by /u/MysteryInc152 [link] [comments]  ( 9 min )
    [R] Train ViT on small datasets
    Hello, everyone, I'm currently working on a computer vision project using the Oxford Pets Dataset, which consists of 37 different pet categories. I initially used a pre-trained ViT model with ImageNet weights model=vit_b_32(ViT_B_32_Weights.IMAGENET1K_V1), and it gave me an impressive accuracy of 88%. However, I want to modify the architecture of the ViT model and train it from scratch without relying on ImageNet weights. I'm aware that ViT models are data-hungry and that training from scratch (model=vit_b_32( )) can be challenging, especially with limited data. I've already applied data augmentation techniques to enhance my dataset, but I'm still struggling to achieve satisfactory results. My accuracy is currently only 7%. I'd appreciate any advice or tips from the community on how I can improve the performance of my scratch-trained ViT model. Are there any specific training strategies, hyperparameters, or architectural modifications that I should consider? How can I make the most out of my limited dataset to boost accuracy? Thank you in advance for your help! submitted by /u/NoEntertainment6225 [link] [comments]  ( 9 min )
    [P][R] Developing a platform to accelerate the research
    Hi community~ We are developing a platform similar to mTurk and Prolific and plan to do the first wave of hypothesis testing in the coming weeks. If you have open tasks that require large amounts of human intelligence, please reply to this thread or dm me. We can support your research in our hypothesis testing. we are on the mission of helping machine learning experts and AI training as open and public goods, you can learn more here: https://ivynetwork.cloud/ feel free to ask more questions here :) submitted by /u/Accomplished_Code_25 [link] [comments]  ( 9 min )
    Math for machine learning [D]
    İ have a question. How important linear algebra for machine learning? İ have basic level knowledge on linear? Should İ study in more detail? And How can İ follow roadmap on math for machine learning? submitted by /u/Necessary-Car-5080 [link] [comments]  ( 9 min )
    [D][P]Has anyone ever tried fine-tuning Tortoise tts for better voice cloning?
    Hello people. I've been wanting to clone voices along with the accent. For example: A user speak English in an Indian accent should have that accent cloned in the output audio as well. By default, tortoise is not good at doing that. It can clone the pitch of the voice really well but the accent is completely lost. I was wondering if fine tuning the model could get me what I want. Please do suggest. Also do suggest any methods on fine tuning it if it does in fact help my use case. Thanks a lot! Note: I've also tried bark, coqui and vall-e-x. They aren't that good for voice cloning from what I saw. submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [D] Evaluating a clothing size prediction model
    I’ve been working on a simple ML classifier that predicts the size of a piece of clothing based in user body specifications (such as height, weight, age, etc). As we want to move the model to production, I’m keen on understanding the best strategies to evaluate its performance in a real-world setting. This is a small example of how we would “tag” our recommendations: If the model predicts a size M, and the user buys and keeps it, it’s a correct prediction If the user buys and M and returns it due to size issues, it’s incorrect If the user buys a different size, returns it, and then buys the size initially recommended, it’s counted as correct. Additionally since we cache user input data, when they visit the same product after a while, or if they visit different product pages, they don’t need to re-input. We need to determine if they looked at this recommendation or if they even took it into account, especially if some time has passed since the original recommendation was made. Main questions I have: 1. What scenarios might I be missing when tagging incorrect/correct predictions 2. How would you approach the data tagging issue in this context? 3. What would be the best strategy to determine if a recommendation was considered by the user or if it’s too old to be reliable? I’ve gotten some insights already but would love to hear more perspectives. Any feedback, experiences, or even related research would be much appreciated! submitted by /u/SufficientPepper1801 [link] [comments]  ( 9 min )
    [R] Anyone working on AI systems for the education, recruitment, HR, credit scoring or financial sectors?
    We are working on a research project on how to identify contextual hidden bias in real use cases, by simulating real-world situations where AI systems behave in unexpected biased ways, regardless of how well their models have performed during development with their training and testing datasets. Would love to hear from those, I'm interested to know if their systems had any issues with misbehaviour in post-market use cases and how they are solving/mitigating this. Ps: If anyone is interested to get involved with the wider research, let me know! submitted by /u/Dismal-Might8594 [link] [comments]  ( 9 min )
    [P] AI NPC's are closer than we think. I made a new game!
    Hello r/MachineLearning! After the positive response to Bargainer.ai, I got really excited about the potential of this technology in larger scale video games like World of Warcraft or GTA for example. I'm happy to announce that I'm now releasing - Convince the Bouncer! This time, you chat with an AI Bouncer and try to gain entry to a very exclusive night club. Don't worry; it's fairly easier than getting into Berghain. Try it out here: convincethebouncer.com P.S.: Get the VIP Pass from the Bouncer, and you might access an upcoming AI platform early! :) Questions or ideas? Let me know. Thanks a bunch! submitted by /u/gavo_gavo [link] [comments]  ( 9 min )
    [R] Factors Influencing Adoption Intention of ChatGPT
    Hello, ​ I am an information systems student currently conducting research for my undergraduate thesis on the factors that influence people's adoption intention of ChatGPT, as well as identifying the factors that may be holding them back. These factors include people's concerns about potential negative impacts of ChatGPT, such as increased unemployment and the spread of misinformation. Your participation in this study is crucial as it will provide valuable insights to help us understand how ChatGPT can be improved to meet users' needs. ​ Please note that I am not affiliated with OpenAI, no identifying information will be collected during the survey, and all responses will be kept confidential. The survey should take approximately 10 to 15 minutes to complete, and participation is voluntary. You may withdraw from the survey at any time, and there are no known risks associated with participating. ​ If you are interested in learning more about the study, please follow the link below. ​ https://docs.google.com/forms/d/e/1FAIpQLSf5HIfXHppMuTR63x00i4OuRAtM5Ti6EGybd-HuI1kmK06VPw/viewform?usp=sf_link ​ Thank you for taking the time to contribute to our research study. Your participation is greatly appreciated! submitted by /u/maulanash [link] [comments]  ( 9 min )
    [D] LLM Models for providing troubleshooting suggestions
    I want to train an LLM model on data related to the Container Orchestration Platform Kubernetes. The LLM should be able to identify issues, provide commands, and provide troubleshooting solutions for a given input. What is the best model for doing so and how much data should I have to train the model? submitted by /u/faizanbasher [link] [comments]  ( 9 min )
    [R] Textbooks are all you need II: phi-1.5 technical report
    Arxiv link: Textbooks are all you need II More generally, phi-1.5 (1.3B) exhibits many of the traits of much larger LLMs, both good – such as the ability to "think step by step" or perform some rudimentary in-context learning – and bad, including hallucinations and the potential for toxic and biased generations – encouragingly though, we are seeing improvement on that front thanks to the absence of web data. We open-source phi-1.5 to promote further research on these urgent topics. submitted by /u/PantsuWitch [link] [comments]  ( 9 min )
  • Open

    World scale inverse reinforcement learning in Google Maps
    Posted by Matt Barnes, Software Engineer, Google Research Routing in Google Maps remains one of our most helpful and frequently used features. Determining the best route from A to B requires making complex trade-offs between factors including the estimated time of arrival (ETA), tolls, directness, surface conditions (e.g., paved, unpaved roads), and user preferences, which vary across transportation mode and local geography. Often, the most natural visibility we have into travelers' preferences is by analyzing real-world travel patterns. Learning preferences from observed sequential decision making behavior is a classic application of inverse reinforcement learning (IRL). Given a Markov decision process (MDP) — a formalization of the road network — and a set of demonstration traject…  ( 92 min )
  • Open

    NVIDIA Lends Support to Washington’s Efforts to Ensure AI Safety
    In an event at the White House today, NVIDIA announced support for voluntary commitments that the Biden Administration developed to ensure advanced AI systems are safe, secure and trustworthy. The news came the same day NVIDIA’s chief scientist, Bill Dally, testified before a U.S. Senate subcommittee seeking input on potential legislation covering generative AI. Separately, Read article >  ( 6 min )
    Mobility Gets Amped: IAA Show Floor Energized by Surge in EV Reveals, Generative AI
    Generative AI’s transformative effect on the auto industry took center stage last week at the International Motor Show Germany, known as IAA, in Munich. NVIDIA’s Danny Shapiro, VP of automotive marketing, explained in his IAA keynote how this driving force is accelerating innovation and streamlining processes — from advancing design, engineering and digital-twin deployment for Read article >  ( 7 min )
    A Quantum Boost: cuQuantum With PennyLane Lets Simulations Ride Supercomputers
    Ten miles in from Long Island’s Atlantic coast, Shinjae Yoo is revving his engine. The computational scientist and machine learning group lead at the U.S. Department of Energy’s Brookhaven National Laboratory is one of many researchers gearing up to run quantum computing simulations on a supercomputer for the first time, thanks to new software. Yoo’s Read article >  ( 6 min )
    One Small Step for Artists, One Giant Leap for Creative-Kind
    Editor’s note: This post is part of our weekly In the NVIDIA Studio series, which celebrates featured artists, offers creative tips and tricks and demonstrates how NVIDIA Studio technology improves creative workflows.  When it comes to converting 2D concepts into 3D masterpieces, self-taught visual development artist Alex Treviño has confidence in the potential of all Read article >  ( 7 min )
  • Open

    DSC Weekly 12 September 2023
    Announcements Top Stories In-Depth The post DSC Weekly 12 September 2023 appeared first on Data Science Central.  ( 20 min )
    Securing your AI data pipeline with MLOps
    By Colin Priest, Chief Evangelist at FeatureByte Enterprises are increasingly implementing Artificial Intelligence (AI) into their operations. However, AI-ready data pipeline practices are still in their infancy, especially when it comes to IT security. The pervasiveness of “Spaghetti Code” Enterprises delving into AI data pipelines often find themselves wading through a mess of complex and… Read More »Securing your AI data pipeline with MLOps The post Securing your AI data pipeline with MLOps appeared first on Data Science Central.  ( 20 min )
    Data Warehousing: The key to effective marketing campaign management
    Businesses today constantly strive to gain a competitive edge in their marketing efforts.  Leveraging their data effectively to create data-driven campaigns is the best way to trump the competition. One of the best tools at their disposal to utilize their data is a data warehouse. Data warehousing is crucial in enhancing marketing and campaign management… Read More »Data Warehousing: The key to effective marketing campaign management The post Data Warehousing: The key to effective marketing campaign management appeared first on Data Science Central.  ( 21 min )
    Data-driven insights: Improving remote team performance with time-tracking analytics
    The way we work has changed, with remote teams now a common part of the landscape. While remote work offers flexibility, it also brings challenges. Managing remote teams effectively is crucial to ensure productivity and collaboration. In this article, we’ll explore how using time tracking for remote teams can help manage employees’ performance better. Time-tracking… Read More »Data-driven insights: Improving remote team performance with time-tracking analytics The post Data-driven insights: Improving remote team performance with time-tracking analytics appeared first on Data Science Central.  ( 21 min )
    AI for Natural Language Understanding (NLU)
    In the panorama of Artificial Intelligence (AI), Natural Language Understanding (NLU) stands as a citadel of computational wizardry. No longer in its nascent stage, NLU has matured into an irreplaceable asset for business intelligence. In this discussion, we delve into the advanced realms of NLU, unraveling its role in semantic comprehension, intent classification, and context-aware… Read More »AI for Natural Language Understanding (NLU) The post AI for Natural Language Understanding (NLU) appeared first on Data Science Central.  ( 21 min )
    How can IoT transform and benefit the entertainment industry?
    The Internet of Things (IoT) has been transforming entertainment and has given it new ways of creating, delivering and consuming content. The wide-ranging utility of IoT devices has improved user experience while enhancing the safety and security of users. The media and entertainment (M&E) companies can leverage IoT technology to improve the overall quality of… Read More »How can IoT transform and benefit the entertainment industry? The post How can IoT transform and benefit the entertainment industry? appeared first on Data Science Central.  ( 20 min )
    AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies
    In our increasingly interconnected world, the digital realm has become both a frontier of innovation and a battleground of threats. As technology advances, so do the tactics of malicious actors who seek to exploit vulnerabilities in our digital infrastructure. The rapid evolution of cyber threats calls for a paradigm shift in defense strategies, and that’s… Read More »AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies The post AI and the cyber challenge: Bridging vulnerabilities in modern defense strategies appeared first on Data Science Central.  ( 22 min )
  • Open

    Help me with modeling my game (source code review)
    Hi! I am working on the implementation for DQN algorithm for one interesting game. This game is interesting because moves in this game are not affecting state of the game directly, but modify beliefs of other participants of the game and basically allow other agents to deduce role of other players in the game. It's game of "Mafia". Here's are the rules: Mafia Game description: Game is played with 10 players, players are getting roles at random. At the beginning of the game there's 3 players who gets Black cards (1 Don and 2 Mafia) and 7 players get Red cards (6 Citizen card and 1 Sheriff card). One team is playing against each other. Three black players knows each other and red players do not know who is red and who is black. Game is played with phases - "Day" and "Night". Du…  ( 11 min )
    Multi-agent DQN
    Hiii, I have some troubles here. I'm working on a multi-agent setting with three DQN agents each with its observation plus a shared reward. I tried many hyperparameters values, however I got a curve as below. I don't know why there are some sudden drops. Is there anyone could help me please ? https://preview.redd.it/ua30pe963tnb1.png?width=1753&format=png&auto=webp&s=77fcc91cfaf08984a5f03014bdc1bc9b69c2b2a9 submitted by /u/GuavaAgreeable208 [link] [comments]  ( 9 min )
    Adversarial Reinforcement Learning
    A curated reading list for the adversarial perspective in deep reinforcement learning. https://github.com/EzgiKorkmaz/adversarial-reinforcement-learning submitted by /u/ml_dnn [link] [comments]  ( 9 min )
  • Open

    Do interneuron can choose other interneuron in connections to send the signal? Or can’t And send the signal to all inter neuron in his connections .
    submitted by /u/PowerfulGeologist373 [link] [comments]  ( 9 min )
    What are the Components of an Artificial Neural Network?
    Hello all, I have compiled an article including important components that constitute an Artificial Neural Network and the link is here: https://www.enjoyalgorithms.com/blog/components-of-ann It includes the information regarding the components like: Input layer Output layer Hidden layer/s Neurons Connections Fully connected Feed Forward Network Weight Matrix Activation function Loss/Cost function Optimization Algorithm, and finally Parameters https://preview.redd.it/gq57nbbgmrnb1.png?width=1280&format=png&auto=webp&s=dd44bbf8ab1c60acc74933c982b4f86cc5199e06 All these components help in designing Neural Network Architecture to solve any classification and Regression Problem. Please have a read and give your valuable feedback to improve it further. Enjoy Learning! submitted by /u/ravish_kumar_007 [link] [comments]  ( 9 min )
  • Open

    FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs
    This research paper was presented at the 28th ACM SIGPLAN International Conference on Functional Programming (opens in new tab) (ICFP), a premier forum for discussing design, implementations, principles, and uses of functional programming. Functional programming languages offer a host of advantages, such as ensuring memory safety (opens in new tab) and eliminating arbitrary side effects. […] The post FP2: Fully In-Place Functional Programming provides memory reuse for pure functional programs  appeared first on Microsoft Research.  ( 10 min )
  • Open

    Amazon SageMaker simplifies the Amazon SageMaker Studio setup for individual users
    Today, we are excited to announce the simplified Quick setup experience in Amazon SageMaker. With this new capability, individual users can launch Amazon SageMaker Studio with default presets in minutes. SageMaker Studio is an integrated development environment (IDE) for machine learning (ML). ML practitioners can perform all ML development steps—from preparing their data to building, […]  ( 6 min )
    Unlocking language barriers: Translate application logs with Amazon Translate for seamless support
    This post addresses the challenge faced by developers and support teams when application logs are presented in languages other than English, making it difficult for them to debug and provide support. The proposed solution uses Amazon Translate to automatically translate non-English logs in CloudWatch, and provides step-by-step guidance on deploying the solution in your environment.  ( 6 min )
    Accelerate client success management through email classification with Hugging Face on Amazon SageMaker
    In this post, we share how SageMaker facilitates the data science team at Scalable to manage the lifecycle of a data science project efficiently, namely the email classifier project. The lifecycle starts with the initial phase of data analysis and exploration with SageMaker Studio; moves on to model experimentation and deployment with SageMaker training, inference, and Hugging Face DLCs; and completes with a training pipeline with SageMaker Pipelines integrated with other AWS services  ( 10 min )
  • Open

    AI model speeds up high-resolution computer vision
    The system could improve image quality in video streaming or help autonomous vehicles identify road hazards in real-time.  ( 10 min )

  • Open

    [D] Is there any up to date speech denoising model?
    I have been googling for a few hours now, and all of the solution that I can find are either very complex or not up to date. Ideally I would like to combine this with OpenAI Whisper to clarify the speech audio file and then transcribe it. Any good methods for this? submitted by /u/aszx789 [link] [comments]  ( 9 min )
    [D] Is retrieval necessary/sufficient to solve "hallucinations"?
    Hi everyone, "Hallucinations" seem to be one of the major blockers to the adoption of LLMs, especially in enterprise settings where seemingly plausible but false information/decisions can be disastrous. I am wondering whether or not current LLMs, like GPT4 or Llama 2 70b have reached the reasoning capabilities to be able not to hallucinate when fed the proper information. One can see a LLMs as college / high school students that have some basic reasoning and knowledge but might need to be nudged to answer business / scientific questions. What is your opinion on the topic? Is there for instance, a database of hallucinations, so that one could test that if the model would have answered properly if the right context was fed in the prompt? View Poll submitted by /u/Separate-Still3770 [link] [comments]  ( 9 min )
    [R] I am looking for a paper for my research and coming up short, help needed.
    I've consulted all the AIs and all of the search engines I could think of and am still coming up short. I'm pretty sure I just didn't make this up, somewhere between 2005 and 2015 there was a paper where the authors describe a method for using silicon neuron forests to predict traffic patterns. They etched silicon columns onto a wafer and then submerged the wafer in an ionic solution. They sent electrical signals into the column array and recorded the output. As the signal was allowed to interact with the silicon in solution, the columns would begin to form small conductive tendrils or filaments to neighboring pillars. These filaments formed a neural network, which could be used to predict traffic patterns. I'm sure there were at least 2 papers that came out about this subject, but for the life of me I cannot find them. I've spent many many hours looking for at least one of the papers and am hoping for a longshot...that some kind internet stranger comes across this and can point me in the right direction. submitted by /u/Inevitable-Start-653 [link] [comments]  ( 9 min )
    [D] Best Solution for Video Quality Control
    What is the best CNN or method for creating a program that can detect video glitches, artifacts, anamolies and highlight them in an exported video as well as producing a .txt file. I'm currently using YOLOV8 which works a bit but it's over detecting and it may not be scalable. Not sure if it has to do with the size of the dataset. Right now, I'm gathering my own dataset, but would love to know if there is one that already exists. Looking for all and any recommendations. Thank you. submitted by /u/icetyche [link] [comments]  ( 9 min )
    [D] SVC/RVC tips for inferencing low quality audio?
    Please let me know if there is a better sub for this! I trained a local voice model of my father, who recently passed away. He was a musician, and I found a handful of songs he had recorded in the 80's. Unfortunately but I only have a copy coming from an audio cassette, which I've digitized. There may be a copy on open reel somewhere in our basement but I haven't found it so far. Any, because dad was always writing and recording new songs, my training dataset consists of all original, high quality vocal stems at 96khz/24bit, spanning at least 20 years and consisting of about 30 ish minutes of audio. I also tried starting over but with a speech he gave as additional training, coming out to around 55 minutes of audio. I wanted to use his voice model to essentially remaster his original vocals from the cassette audio, and rerecord the rest of the instruments myself. I isolated the vocals using UVR (also tried mdx23), and tried to clean it up further as much as possible. The RVC vocals comes out alright, but due to the low quality of the audio, it seems to mess up certain words, for example "free" comes out as "fee". The SVC vocals came out with a closer tonality, but had many more artifacts, and pitch had a tendency to go up/down 2 octaves at the end of certain phrases. When using a higher quality sample (like one of my own vocal tracks), these issues aren't so prevalent, although rvc seems to create more of a hybrid sounding timbre than svc, blending the original voice + model rather than just the sound of the model. Happy to post samples of input/output audio, let me know! tl;dr high quality training data, low quality inference audio. Voice model is the same voice as input audio, 1. tips for making the best of what I've got? 2. any way to use text along with input audio and maintain original vibrato? 3. any way to train a UVR/MDX model using a particular voice in order to better isolate that person? thanks :) submitted by /u/bbmaster123 [link] [comments]  ( 10 min )
    [R] Cognitive Architectures for Language Agents - Princeton University 2023
    Paper: https://arxiv.org/abs/2309.02427 Github: https://github.com/ysymyth/awesome-language-agents Twitter: https://twitter.com/ShunyuYao12/status/1699396834983362690 Abstract: Recent efforts have incorporated large language models (LLMs) with external resources (e.g., the Internet) or internal control flows (e.g., prompt chaining) for tasks requiring grounding or reasoning. However, these efforts have largely been piecemeal, lacking a systematic framework for constructing a fully-fledged language agent. To address this challenge, we draw on the rich history of agent design in symbolic artificial intelligence to develop a blueprint for a new wave of cognitive language agents. We first show that LLMs have many of the same properties as production systems, and recent efforts to improve their grounding or reasoning mirror the development of cognitive architectures built around production systems. We then propose Cognitive Architectures for Language Agents (CoALA), a conceptual framework to systematize diverse methods for LLM-based reasoning, grounding, learning, and decision making as instantiations of language agents in the framework. Finally, we use the CoALA framework to highlight gaps and propose actionable directions toward more capable language agents in the future. https://preview.redd.it/09kdff4sdonb1.jpg?width=1276&format=pjpg&auto=webp&s=7aaa6d59d602f7e9ab124c812bbfa6bba5b7373b https://preview.redd.it/6cly0e4sdonb1.jpg?width=1277&format=pjpg&auto=webp&s=5a5164b84af5f828a668560acb64e5c579693d1f https://preview.redd.it/mvatjf4sdonb1.jpg?width=1277&format=pjpg&auto=webp&s=7c3a039db3a7e0f3de38f761f8aefa1c8d331ae5 https://preview.redd.it/bj5wdj4sdonb1.jpg?width=1270&format=pjpg&auto=webp&s=473b273ae0097aaa51d6578e9c5e3b9c953cc421 https://preview.redd.it/501cnf4sdonb1.jpg?width=1578&format=pjpg&auto=webp&s=c8b1762ac28c89f8ac2f8d4fb6d9ecae06491c3e ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] do you use pycharm for machine learning?
    For learning ann, cnn I used Google colab. But now for learning rnn I have decided to use ide(a fan of jetbrains). But even if I change one line of code who code recompiles. This wasn't case for colab. Is there a feature in pycharm I don't know(I learned python through text so don't know much about ide). I am a newbie. submitted by /u/Coc_Alexander [link] [comments]  ( 9 min )
    [D] Using AI to navigate the complexities of regulatory frameworks
    I would be interested in hearing opinions for using AI for regulatory assurance and compliance in regulated industries, what are your thoughts? Explanation: An AI-driven compliance system ensuring adherence to evolving regulations, minimizing risks, and enabling businesses to operate confidently within legal boundaries. Pairing Large Language Models (LLMs) with blockchain technology to offer a range of benefits, particularly in the context of regulatory compliance. LLMs, powered by advanced natural language processing and machine learning capabilities, can enhance regulatory compliance processes in several ways. Firstly, they can automate the analysis of regulatory documents, helping businesses stay updated with evolving compliance requirements. LLMs can also assist in generating compliance reports, simplifying complex legal language into understandable summaries. Furthermore, by integrating LLMs into smart contracts, businesses can ensure that contract terms adhere to regulatory guidelines automatically. The integration of LLMs with blockchain can significantly improve regulatory compliance by automating document analysis, simplifying legal language, monitoring compliance in real-time, and enhancing customer interactions—all contributing to greater efficiency and accuracy in adhering to regulatory standards. I have a whole technical whitepaper with this stuff on hand, if anyone would like to review it let me know.. submitted by /u/cryptobooty_ [link] [comments]  ( 9 min )
    [R] A surprisingly effective way to predict token importance in LLM prompts
    Hey folks, we explored a novel method to gauge the significance of tokens in prompts given to large language models, without needing direct model access. Essentially, we just did an ablation study on the prompt using cosine similarity of the embeddings as the measure. We got surprisingly promising results when comparing this really simple approach to integrated gradients. Curious to hear thoughts from the community! Here are links to the demo and blog post submitted by /u/shayanjm [link] [comments]  ( 9 min )
    [D] I am looking for an authoritative, consistent and complete description of autodiff.
    I am currently trying to learn about how autodiff is used in gradient calculations. In the all sources I've came across, none can explicitly point to an authoritative, consistent or complete source on autodiff. I don't need examples of autodiff, I just need the full, generalized algorithm laid out. For example: I open this link: https://w3.cs.jmu.edu/spragunr/CS445/lectures/autodiff/autodiff.pdf which simply provides autodiff applied on an example, but not the description of the algorithm. The author of that link says if I need any more info, I should go to this other link: https://stats.stackexchange.com/questions/224140/step-by-step-example-of-reverse-mode-automatic-differentiation/235758#235758 But this link doesn't even have topological sort as part of the operation. Therefore these descriptions of the autodiff is inconsistent and leaves me not knowing who to trust. Can someone point to some original paper on autodiff or a single source that describes this algorithm fully? I note here that this situation is completely different from backpropagation. The full backpropagation algorithm is impeccably laid out in peer-reviewed text books such as Learning from Data by Mustafa et al. and Optimization textbook by Chong and Zak. Furthermore, the algorithm defined in these two books are completely consistent with one another. submitted by /u/fromnighttilldawn [link] [comments]  ( 9 min )
    [R] AI Model for Cancer Origin Detection
    https://preview.redd.it/14fj73aw2nnb1.jpg?width=1200&format=pjpg&auto=webp&s=e94612b3dec5f7b9f71490ad5a60ced73fb3373d Researchers at MIT and the Dana-Farber Cancer Institute have developed an innovative AI-powered model called OncoNPC, designed to assist in identifying the origin of tumors in cancer patients where the primary site is unknown. For a small percentage of cancer patients, pinpointing the origin of their cancer can be incredibly challenging, making it difficult to select the most appropriate treatment, as many cancer drugs are designed for specific cancer types. Using machine learning, the researchers created OncoNPC, a computational model capable of analyzing the genetic sequences of approximately 400 genes. This model, based on genetic data routinely collected at Dana-Farbe…  ( 11 min )
    [R] Locally Hosted AI Text Model
    Hello, I want to start a small experiment in my company and install an AI locally on my computer, extra only locally, so no company data can be stolen. The AI should be able to summarize large PDFs. Which textmodel can you recommend me, which is local and not too big? submitted by /u/DesNutella [link] [comments]  ( 9 min )
    [D] neural network designed for model selection
    Hi all, So, I normally work in a different field (macroeconomics/econometrics) and I have been working on a slightly different project for a bit because a change of pace is fun every once in a while. The problem is I am running out of good ideas on where to continue and i would appreciate any input! The problem set up is as follows: I am building a second draft for a forecasting toolbox based on a custom ARIMA framework. The problem with this type of exercise is always model selection. For this project I focus on in-sample criterions as the data has very few time periods overall. The typical strategy to find a decent model is to make an initial guess (i.e., a constant model, a random walk model, etc.) and then run a stepping algorithm that probes the model space and, ideally, it coverg…  ( 10 min )
    [N] Seeking Beta Testers for Qwak's New Vector Store Feature: Revolutionize Your Vector Data Management!
    Hey Redditors, We're thrilled to announce a new feature from Qwak.ai—Vector Store—and we're on the hunt for beta testers to help us make it even better. 🔍 What's Vector Store? Vector Store is a next-level solution for managing vector data. It's designed to empower organizations to harness the power of vector search on their own datasets. Here's what it offers: 🔄 Automated Data Ingestion: Schedule jobs to pull data from databases like Snowflake, BigQuery, and RedShift. 🎯 Easy Vector Search: Use our Python SDK or REST API to search, upsert, or delete vectors effortlessly. 🛡️ Secure Storage: Your vectors are stored securely and are always accessible when you need them. 📖 Learn More About Vector Store 🤝 Why We Need Beta Testers We're keen to gather insights on usability, performance, and any bugs that might pop up. 📝 How to Get Involved Interested in being a part of this? Comment below or shoot us a DM. submitted by /u/Practical-Lecture733 [link] [comments]  ( 9 min )
    [D] Text segmenting using Spacy and BERT
    Hi. I need to segment some short strings, they are about 100 tokens in length (BERT tokens), or 20-40 words. 1 segment per text. I'm currently using Spacy first, and if it fails, then use BERT. It's alright, but BERT is not really up to par to what I hope for. I'm wondering if there is some better use, as this is my first NLP project. I've trained the huggingface BertForTokenClassification to label the text as either part of segment, or not part. So binary token classification, then smooth out outliers and pick the longest segment of each text. Each string has only 1 segment. I've trained the BERT with 500 examples. I can easily make more train data, though training on my laptop takes time. If there are better/alternative approaches, I'd love to hear them. Regex rules etc. don't apply, which is why I'm using DL models. Especially I feel like I should segment the whole segment at a time, not by token. submitted by /u/Infamous-Bank-7739 [link] [comments]  ( 9 min )
    [D] Appreciation post for Folktables datasets
    I want to take a second to express my appreciation for the Folktables datasets! Folktables is a Python package that contains datasets derived from US Census data. The datasets cover topics about income, employment, health, transportation, and housing. They are quite useful for studying the effects of distribution shifts on ML models. For example, one could design experiments to: 1. Study the model's performance under geographic distribution shifts: Each prediction problem in Folktables can be instantiated with data from every US state. So one could use Folktables to study questions around geographic distribution shifts. For example, we can train a classifier using data from California and then evaluate it on data from Michigan. 2. Study the model's performance under temporal distribution shifts. Folktables contains data for several years, which in itself constitutes a form of temporal distribution shift. So, we can train a classifier using employment data from California in 2014 and evaluate how its equality of opportunity violation or accuracy varies over time. Finding non-synthetic (and open-access) datasets that exhibit these behaviors is so hard! Yet, it's quite easy to encounter them in production environments. 🫠 So, big kudos to the UC Berkeley and Toyota Research Institute research teams for crafting these datasets. Folktables Python package: https://github.com/socialfoundations/folktables Link to paper where Folktables was introduced: https://arxiv.org/pdf/2108.04884.pdf submitted by /u/santiviquez [link] [comments]  ( 9 min )
    [R] Problem with understanding Extended Kalman Filter
    Hey, I'm working on Attitude-Heading Reference System, and I would like to make it with EKF. My approach is to use Euler angles for computing [roll, pitch, yaw] matrix. I read a book "Small unmanned aircraft: theory and practice" and watched few videos on YouTube that are reffering to this particular book. But there is a thing in correction step that I do not fully understand. To be clear, the correction looks as follows: x = x_(t-1) + K*(z - h(x)) Both in the book and in videos, the state vector "x" is [roll, pitch]. But "z" and "h(x)" are the actual and predicted accelerometer readings [ax, ay, az]. So it looks to me, that they try to correct prediction of angles with readings in acceleration: [roll, pitch] = [roll, pitch]_(t-1) + K*[ax, ay, az]. What am I missing? submitted by /u/Skrz_ [link] [comments]  ( 9 min )
    [P] Whisper Large Benchmark: 137 DAYS of Audio Transcribed in 15 Hours for Just $117 ($0.00059/min)
    We recently benchmarked whisper-large-v2 against the substantial English CommonVoice dataset on a distributed cloud (SaladCloud) with consumer GPUs. The Result: Transcribed 137 days of audio in 15 hrs for just $117. Traditionally, utilizing a managed service like AWS Transcribe would set you back about $10,500 for transcribing the entirety of the English CommonVoice dataset. Using a custom model? That’s an even steeper $13,134. In contrast, our approach using Whisper on a distributed cloud cost just $117, achieving the same result. The Architecture: Our simple batch processing framework comprises: Storage: Audio files stored in AWS S3. Queue System: Jobs queued via AWS SQS, with unique identifiers and accessible URLs for each audio clip. Transcription & Storage: Post transcript…  ( 10 min )
    Need help with CNN data format for genomics [R]
    Hello, I wrote CNN using tensor flow to predict phenotypes (cognitive impairment (0/1) and language ability (0-3)) from .vcf files. I transformed the .vcf into tabular format and after determining which columns to keep, I have Chromosome Position Mutation (taken from ref/alt columns) Genotype And merged it with the phenotype data so basically every row has the cog and language scores. I feel like this is a bad way to go about doing this since the model is likely predicting cog/language scores for each mutation rather than each patient as a whole. Am I wrong? How can I fix this so it’s more of a composite of all mutations per subject that the model trains on? Thanks! submitted by /u/Pristine_Ingenuity49 [link] [comments]  ( 9 min )
    [D] Automatic split a video into chapters?
    Given a video with audio, we can use ASR to get a script of the sentences and timestamps. We are looking for a way to group the sentences into chapters. There are several companies that are doing it nicely - Google on YouTube, Assembly AI, but we couldn't find any good resource or paper that explains the research behind how they do it. BertTopic seems to give us just a topic for each sentence, but not a way to split the video. it also doesnt account for timestamps. Wondered if anyone has any links or any other ideas? Thanks very much! Lior submitted by /u/liormessinger [link] [comments]  ( 9 min )
    [P] Data structures for large sequences
    Hi everyone I've been working for quite some time on this project and any feedback will be greatly appreciated. Basically, I've been testing different data structures for large sequence prediction and clustering. Mainly SARSCov2 viral sequences due to its availability. At the moment, I have published two preprints https://www.researchsquare.com/article/rs-2797280/v3 https://www.researchsquare.com/article/rs-1691291/v1 and a general summary of the findings can be found here. https://github.com/TavoGLC/SARSCov2Solar https://www.kaggle.com/code/tavoglc/a-computational-description-of-sarscov2-adaptation I've tried to publish it a couple of times with no success and no comments regarding its accuracy or any potential problems. I hope you guys can check it out and provide some feedback if possible. Just for full transparency, I'm trying to raise funds to further develop those techniques. Donations are extremely welcomed but not encouraged at the moment, just disclosed for transparency. submitted by /u/TavoGLC [link] [comments]  ( 9 min )
    Using LLMs to Analyze and Extract Insights from Device Logs [P]
    I work with device logs that are massive text files, filled with data that's hard to go through manually. I'm primarily interested in extracting specific events or insights, such as security incidents or device malfunctions. The conventional method would be to use regular expressions to filter out relevant information since the logs are structured, but I'm curious about leveraging Large Language Models for this task. I've experimented a bit with zero-shot learning for text summarization but didn't get satisfactory results. Before I invest more time into fine-tuning an LLM, I'd love to hear from anyone who has experience or advice on how to approach this problem. Could LLMs potentially make the process more efficient and effective? Any pointers or suggestions would be greatly appreciated. submitted by /u/Practical_Mango_8720 [link] [comments]  ( 9 min )
  • Open

    I made another AI game, the future of NPCs!
    Hello, fellow AI enthusiasts! After the positive response to Bargainer.ai, I got really excited about the potential of this technology in larger scale video games like World of Warcraft or GTA for example. I'm happy to announce that I'm now releasing - Convince the Bouncer! Chat with an AI Bouncer and try to gain entry to Elysium, the most elite night club. Don't worry; it's fairly easier than getting into Berghain. Give it a spin here: convincethebouncer.com P.S.: Get the VIP Pass from the Bouncer, and you might access an upcoming AI platform early! :) Questions or ideas? Let me know. Thanks a bunch! submitted by /u/gavo_gavo [link] [comments]  ( 9 min )
    Vote bot
    Any good recommendations for a vote bot to cast votes for a insignificant online poll? It’s a no security website. Just refresh and hit a vote button. submitted by /u/fa6664 [link] [comments]  ( 9 min )
    🤖Inside Tencent Hunyuan, Ant's Financial LLM, and Zhipu AI's Rising Valuation
    submitted by /u/trcytony [link] [comments]  ( 9 min )
    We Polled Different Audiences on the Simulation Trilemma. Techies Favor Simulation (50-81%), Others Bet 0%. Are We Overestimating Simulation Probability, and Why?
    submitted by /u/stefanbg92 [link] [comments]  ( 9 min )
    Best current long-form text summarizers?
    I check every now and then. I use some good ones that can summarize a short YouTube video, or a brief-ish article. But I really want something that can chomp down a whole book and distill its essence without making "summaries of summaries", which tend to increase inaccuracy and errors. A good summary is concise and precise, and I want flexibility with bullet points and level of detail. Having issues with ChatGPT-based tools' token limits, and some that purport to support GPT4 (I could use that "advanced reasoning") but but have to fall back to GPT3 for various errors and reasons. So I'm open to Claude-based ones (may be too early, mo' tokens) and other proprietary engines. What is everyone using, including paid (rightfully so if they offer value for the money) services? submitted by /u/Torley_ [link] [comments]  ( 9 min )
    Notes app doodles to images for architecture design concept iterations using ControlNet and SDXL
    submitted by /u/Alternative_Lab_4441 [link] [comments]  ( 8 min )
    AI WILL TAKE OVER THE WORLD. Meanwhile the AI...
    submitted by /u/Bananas8ThePyjamas [link] [comments]  ( 8 min )
    Meta is reportedly working on a new AI model to rival GPT-4
    Meta is working on a new AI model to rival GPT-4. The company is acquiring AI training chips and building data centers to create a powerful chatbot. CEO Mark Zuckerberg wants it to be free for companies to create AI tools with. Meta is assembling a group to build the model and speed up the creation of AI tools that can emulate human expressions. There are rumors of generative AI features and the launch of AI 'personas' this month. Source : theverge.com submitted by /u/NuseAI [link] [comments]  ( 9 min )
    If AI becomes conscious, how will we know? | "Scientists and philosophers are proposing a checklist based on theories of human consciousness"
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    Help me in finding right resources to understand the world of AI from a business perspective
    I’m a Business generalist who worked with multiple tech led businesses and looking to understand fundamentals of the technology from scratch. Please help me with any relevant courses/reading material/YT channels etc. that can help me kickstart the journey. P.S. I have a brief background with Business Analytics but haven’t done any coding extensive ever in my life. Thanks in Advance submitted by /u/Firm_Brother_7124 [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/10/2023
    Chinese big tech Tencent has announced a medical large-scale model and unveiled a brand new matrix of AI products for various scenarios, including intelligent Q&A, a family doctor assistant, and a digital medical imaging platform. The drug discovery platform “Yunshen” (iDrug) has also revealed a new protein structure prediction algorithm framework called “tFold.”[1] Morgan Stanley is gearing up to implement an artificial intelligence-driven chatbot, a strategic move aimed at delivering valuable insights and administrative support to their team of financial advisors.[2] A fresh Russian AI bot has displayed larger potential than the most famous chatbot created by US-based OpenAI, IT giant Yandex alleged in remarks to RIA Novosti on Saturday.[3] Meta is developing a new, more powerful AI system, Wall Street Journal reports.[4] Sources: [1] https://drug.ai.tencent.com/en [2] https://voonze.com/morgan-stanley-introduces-ai-powered-chatbot-for-enhanced-services/ [3] https://menafn.com/1107040379/Russian-AI-bot-shows-larger-potential-than-ChatGPT [4] https://www.reuters.com/technology/meta-is-developing-new-more-powerful-ai-system-wsj-2023-09-10/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Meta plans to match OpenAI's GPT-4 with its new AI model
    Citing GPT-4 as the benchmark, Meta is reportedly gearing up to train a new, highly sophisticated AI model. The company is investing heavily in AI training chips and boosting its data centers to support the ambitious project. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/ts4a6reeuinb1.jpg?width=1440&format=pjpg&auto=webp&s=940be18b0c5f13e2762e1664a96274b314fe00df Meta’s vision for its new AI model Meta's aim is to create a powerful chatbot in line with OpenAI’s GPT-4 capabilities. The company has reportedly been in pursuit of Nvidia H100 AI-training chips and is amplifying its infrastructure. The idea is to independently train its new model without outsourcing to platforms like Microsoft's Azure. Efforts and roadblocks Meta plans to begin the training of this LLM early in 2024, emphasizing free access for companies. Despite its grand vision, Meta has encountered obstacles like researcher attrition and contentious resource allocation amidst multiple LLM projects. Noteworthy is the intense competition from major players such as Apple, Google, and Amazon to integrate widely generative AI in their user interface. Broader implications While OpenAI has not immediately revealed plans for a GPT-5, other tech giants are investing heavily. Apple's investment in its "Ajax" AI model signifies the increasing race to advanced AI. This move by Meta represents the ongoing trend of tech conglomerates expanding in the AI space, a fact revealed through Google and Microsoft’s use of AI in their productivity tools and Amazon's ongoing developments. (source) P.S. If you like this kind of analysis, I write a free newsletter that explores the latest AI developments. Professionals from Google, Meta, and OpenAI are already reading it. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    Luiza GPT Assistant - virtual girlfriend or boyfriend based on neural network ChatGPT and Telegram
    Luiza GPT Assistant is a simple virtual assistant that mimics your friend, girlfriend or boyfriend, based on neural network ChatGPT and Telegram. Get unique good morning wishes, goodnight, compliments or just chat. https://github.com/r57zone/LuizaGPTAssistant submitted by /u/r57zone [link] [comments]  ( 9 min )
  • Open

    Hyperellipsoid surface area
    Dimension 2 The equation for the perimeter of an ellipse is where a is the semimajor axis, e is eccentricity, and E is a special function. The equation is simple, in the sense that it has few terms, but it is not elementary, because it depends on an advanced function, the complete elliptic integral of the […] Hyperellipsoid surface area first appeared on John D. Cook.  ( 6 min )
    Solve for ellipse axes given perimeter
    I posted some notes this morning on how to find the perimeter of an ellipse given its axes. The notes include a simple approximation, a better but more complicated approximation, and the exact value. So given the semi axes a and b, the notes give three ways to compute the perimeter p. If you are […] Solve for ellipse axes given perimeter first appeared on John D. Cook.  ( 5 min )
    Possible and actual football scores
    The home team lost in a new way yesterday. The Baltimore Ravens beat the Houston Texans by 25-9. This was the first time that score has been seen in the NFL. Possible individual team scores How many scores are possible? It is possible to score any number of points except 1. You can score 2 […] Possible and actual football scores first appeared on John D. Cook.  ( 5 min )
  • Open

    Falcon 180B foundation model from TII is now available via Amazon SageMaker JumpStart
    Today, we are excited to announce that the Falcon 180B foundation model developed by Technology Innovation Institute (TII) is available for customers through Amazon SageMaker JumpStart to deploy with one-click for running inference. With a 180-billion-parameter size and trained on a massive 3.5-trillion-token dataset, Falcon 180B is the largest and one of the most performant models with openly accessible weights. You can try out this model with SageMaker JumpStart, a machine learning (ML) hub that provides access to algorithms, models, and ML solutions so you can quickly get started with ML. In this post, we walk through how to discover and deploy the Falcon 180B model via SageMaker JumpStart.  ( 14 min )
    Amazon SageMaker Domain in VPC only mode to support SageMaker Studio with auto shutdown Lifecycle Configuration and SageMaker Canvas with Terraform
    Amazon SageMaker Domain supports SageMaker machine learning (ML) environments, including SageMaker Studio and SageMaker Canvas. SageMaker Studio is a fully integrated development environment (IDE) that provides a single web-based visual interface where you can access purpose-built tools to perform all ML development steps, from preparing data to building, training, and deploying your ML models, improving […]  ( 10 min )
  • Open

    How can in log the console verbose to an xslx file every game played in Stable Baselines 3?
    I want to log information like this: ------------------------------------ | rollout/ | | | ep_len_mean | 48.1 | | ep_rew_mean | 2.71 | | time/ | | | fps | 452 | | iterations | 1000 | | time_elapsed | 11 | | total_timesteps | 5000 | | train/ | | | entropy_loss | -0.67 | | explained_variance | -32 | | learning_rate | 0.0007 | | n_updates | 999 | | policy_loss | -0.0567 | | value_loss | 0.0158 | ------------------------------------ to an excel file. Here is my main code: gymnasium.env = snakeEnv() # data_manager = snakeEnv.get_data_manager() # Create an A2C model model = A2C("MlpPolicy", gymnasium.env, verbose=1, device="cuda") # Train the model (replace 'total_timesteps' with appropriate values) model.learn(total_timesteps=100_000, log_interval=1000) Here is the relevant code in my agent file: class snakeEnv(gym.Env): def __init__(self): super(snakeEnv, self).__init__() # Define action and observation space # They must be gym.spaces objects # Example when using discrete actions: self.initNames() self.action_space = spaces.Discrete(3) self.gameCount = 0 self.record = 0 self.reward = 0 self.score = 0 self.game = SnakeGameAI(self.selectedChallenge) self.observation_space = spaces.Box(low=-1000, high=1000, shape=(11,), dtype=np.uint8) def step(self, action): self.reward, self.done, self.score = self.game.play_step(action) self.observation = self.getState(self.game) self.info = {} return self.observation, self.reward, self.done, self.info def reset(self): self.gameCount += 1 self.data_manager.logData(self.gameCount, self.score, self.record, self.reward, self.game.getDeathReason(),self.game.getHeadPos()) self.game.reset() observation = self.getState(self.game) if self.score > self.record: self.record = self.score return observation It would be nice to be able to log the data in the reset function. I know how work with xslx files, the main things is just being able to get the model data. submitted by /u/MrHank2 [link] [comments]  ( 9 min )
    Mid turn actions
    Hello everyone! I want to develop a DRL agent to play a turn-based 1v1 game and I'm starting to plan how to handle things in the future. One potential problem that I thought of is that there is a possible mid turn one-sided decision. An abstraction of the game would be like: There are two players: player A and player B. At the start of each turn, each player chooses an action between 3 possible actions. If player A chose a specific action (let's say action 1), the game asks player B to make a decision (let's say block or not block) and vice versa. Actions are calculated. Next turn starts. What would be a good approach to handle that? I thought of two possible solutions: 1. Anticipate the possibility of that mid turn decision beforehand adding a new dimension to the actions space (e.g. take action 3; if opponent takes action 1, block). That sounds that it could create credit assignment problems e.g. giving credit to the second action when it actually didn't happen. 2. Make two policies with shared value functions. That sounds complicated and I saw that previous works like DeepNash actually did that, but I don't know what problems could arise from that. Opinions/suggestions? Thanks! submitted by /u/victorsevero [link] [comments]  ( 9 min )
    Help with continuous action spaces
    Newbie here. How does an continuous action space look like? E.g. The agent should choose an integer between 1 and 10. With discrete action space it could look something like this: def step(self, action): if action == 0: self.chosenNumber = 1 ... if action == 9: self.chosenNumber = 10 how would this look like with an continuous action space? submitted by /u/ChampionshipWhole467 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Grace Hopper Superchip Sweeps MLPerf Inference Benchmarks
    In its debut on the MLPerf industry benchmarks, the NVIDIA GH200 Grace Hopper Superchip ran all data center inference tests, extending the leading performance of NVIDIA H100 Tensor Core GPUs. The overall results showed the exceptional performance and versatility of the NVIDIA AI platform from the cloud to the network’s edge. Separately, NVIDIA announced inference Read article >  ( 7 min )
  • Open

    System combines light and electrons to unlock faster, greener computing
    “Lightning” system connects photons to the electronic components of computers using a novel abstraction, creating the first photonic computing prototype to serve real-time machine-learning inference requests.  ( 9 min )

  • Open

    [N] Meta Is Developing a New, More Powerful AI System as Technology Race Escalates
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [D] Data Extraction using fine-tuned LLM?
    Hey Reddit, I'm working on a tool to pull data from highly irregular Excel files. I've gotten reasonable results which is extremely fast with standard Python coding, but it's far from perfect due to the lack of standardized templates. Interestingly, when I tested ChatGPT-4 on a sample table, it did a decent job at data extraction. However, relying solely on GPT-4 has its downsides like token limits and slow processing speed (and data privacy issues). Plus, splitting the Excel sheet to fit within these limits results in loss of context and data. I'm considering fine-tuning a language model to post-process data that was in a Pandas DataFrame (perhaps converted to JSON). Has anyone had success with this approach or have alternative recommendations? I've tried Langchain, but it wasn't helpful. I have figured out to extract the relevant columns, but the post-processing part is where I am considering using an LLM which understands the domain and what needs to be extracted based on the examples I feed it. Looking forward to your thoughts! And would be happy to answer any additional questions. submitted by /u/rs35plus1 [link] [comments]  ( 9 min )
    [D] Should I transfer all my work to PyTorch already?
    I've been using Tensorflow since 2017. I know it wasn't ideal or easy back then, but as an early adopter, I became very proficient with it and it has improved a lot since then. I have developed and deployed many custom models in low-level TF, both with and without utilizing the Keras abstractions. I am very comfortable with it in general. But I'm noticing now that Pytorch is gaining more popularity, all the younger practitioners, who got into deep learning within the last 3-5 years, are Pytorch adopters. I've also heard rumors that even googlers are also abandoning TF. I started playing around with Pytorch and as a TF expert, I couldn't help but getting annoyed at how far it is lagging behind in many abstractions and optimizations. I know things are getting better now with the Pytorch 2.0 and introducing some optimization such as the "compile" functionality, but still many of the pytorch project tools remain in beta such as Torchtext and I find many things very annoying, such as having to set the device and pass it on to layers if you want GPU acceleration, having to install Torchtext and other processing libraries separately, or having to use a Dataloader and the limited data type supports for torchdataset. Most people who have not mastered Tensorflow would not relate to my annoyance. Anyhow I'd really prefer to stay within my comfort zone and continue to develop and improve in TF, but if TF is dying, then I better not to, right? So should I convert? Is it indeed dying? submitted by /u/DieselZRebel [link] [comments]  ( 9 min )
    [D] This is my first blog on medium about Machine Learning please have a look and show some love
    submitted by /u/indusop [link] [comments]  ( 9 min )
    [D] Bayesian_Net: A neural network for Bernoulli naive Bayes classification
    What do you think of my recent work? https://github.com/jacobmcasey/bayesian_net At its core, its a neural network for Bernoulli naive Bayes. It uses a 3-layer neural network in NumPy for predicting priors of Bernoulli Naive Bayes Would love to get your feedback on this classifier project! submitted by /u/Ok_Grape_3670 [link] [comments]  ( 9 min )
    [D] How to solve loss spikes in pre-training?
    It happens on and off and I have tweaked many hyperparameters but nothing seems to work significantly better. Is there a recommendation on what to check/tweak? submitted by /u/MrAaronW [link] [comments]  ( 9 min )
    [D] Simple Questions Thread
    Please post your questions here instead of creating a new thread. Encourage others who create new posts for questions to post here instead! Thread will stay alive until next one so keep posting after the date in the title. Thanks to everyone for answering questions in the previous thread! submitted by /u/AutoModerator [link] [comments]  ( 9 min )
    [D] Pdf text to speech
    Hey, I would like to listen to my PDFs I got. Week would the best / easiest way to get an mp3 from my pdfs with good voice? I got a rtx 3070 to run it locally. submitted by /u/Independent_Hyena495 [link] [comments]  ( 9 min )
    [D] Naive pipelining of LLM inference across multiple small GPUs? (self.MachineLearning)
    For transformer inference, is it ok to pipeline blocks/layers to multiple low-memory GPUs (lower blocks to GPU-A and upper blocks to GPU-B) ? A to B bandwidth should be relatively low, and each GPU needs half the model memory. This increases inference latency - fine for our use case... Not sure how this would affect the KV cache ? The excellent Lil'Log article suggest I read the training optimization article, which has this image which is for training but not for inference submitted by /u/yazriel0 [link] [comments]  ( 9 min )
    [D] Best architecture for prediction logging in production
    I am in the process of setting up the first iteration of model monitoring in production. Models are currently served through torchserve in kubernetes (on GCP). In the pasts few years I've been mainly using Vertex AI off-the-shelf tools. Ideally, I'd like to store the following data: Request input. Model prediction. User feedback/groud truth (this might come at a later time). Anything that I might be missing out. This would enable me to implement a wide array (either using libraries like whylogs or evidently that have a lot of the reporting side of things already baked in) of cheks for drift and model degradation. I am wondering what would be the best set up to achieve this. This is currently what I came up with: Save a payload at inference time containing all the fields descr…  ( 10 min )
    [D] HF accelerate vs native pytorch autoscaling for mixed precision training
    I want to start using mixed precision in my training, particularly for CV with high-resolution images. HF accelerate seems quite popular nowadays and looks nice. However, in the past I've invested in learning things like Pytorch-Lightning which look good in a minimal example, but actually add more annoyance than they're worth. Pytorch also can do these things, and the boiler plate doesn't look worse at a glance: https://pytorch.org/blog/accelerating-training-on-nvidia-gpus-with-pytorch-automatic-mixed-precision/ https://huggingface.co/docs/accelerate/index ​ Any experiences with either? Cheers! ​ submitted by /u/AuspiciousApple [link] [comments]  ( 9 min )
    [D] Using LLMs to build Evaluation Sets
    Hi. Is this really better practice than having human linguists build evaluation sets for domain-specific data? submitted by /u/throwaway34334534 [link] [comments]  ( 9 min )
    [P] RLHF and Its Alternatives
    submitted by /u/seraschka [link] [comments]  ( 9 min )
    [P] Open-source Python package for Exploratory Data Analysis for modern NLP applications looking for contributors.
    If you're working on NLP and you're serious about understanding your data, check out Wordview. Wordview is an open-source Python library designed to make Exploratory Data Analysis (EDA) for text for modern NLP applications simpler and more efficient. It consolidates various analysis tools under one roof —think document metrics, multi-word expressions, language identification, POS tags, and even bias analysis. We're looking for contributors to help us expand and refine its capabilities. Join us in making NLP data analysis easier and more insightful with Wordview! Here is the link to the GitHub page: https://github.com/meghdadFar/wordview If you just want to use Wordview without contributing, you're very welcome too. Note that it's pretty new and we are still testing things. Please hence feel free to report bugs and send us your feedback and opinions. Looking forward! submitted by /u/SyntaxTreeHugger [link] [comments]  ( 9 min )
    [D] Question about dealing with EEG inter-subject Variability when training ML models
    Hi, I'm an undergrad student working on a machine learning project about motor imagery classification for BCI. One requirement for the project is that I need to implement LIME an explainable AI tool on my model. I ran into a problem rather quickly as my model accuracy quickly decreases as I try and add more subjects to the training data, I'm the publicly avaliable EEG Motor Movement/Imagery Dataset from physio.net. The features I'm currently using for training my model are SVD entropy, Spectral Entropy, Hjorth mobility , Hjorth complexity and CSP components after applying overlapping filter bank. One method I've been suggested is to train models seperately for each subject, I don't know how to go about this should I construct a new model for each subject or should I keep the architecture but reset the training weight, also would this defeat the purpose of implementing LIME in the first place. I'm wondering if there's a way to deal with inter-subject variability without having to make/ train models for each subject seperately. I'm sorry if the choice of features doesn't make much sense since I'm basically self-taught and I have no prior knowledge about EEG or BCI TLDR; how do I deal with inter-subject variability without having to make/ train models for each subject seperately. submitted by /u/Necrozx13 [link] [comments]  ( 9 min )
    [P] Automatic hyperparameter tuning for CatBoost and LightGBM
    Hey friends, I have developed a library, called 100gecs, that makes hyperparameter tuning on LightGBM and CatBoost models trivially easy. Background LightGBM and CatBoost are gradient boosted tree models, like XGBoost, and in many cases the best baseline model in supervised learning tasks on tabular data. They work by iteratively fitting trees on data, with each subsequent tree "correcting" on some level the prediction of the prior tree. Here's a good intro, if you want some more background on these methods. Hyperparameter tuning promises an optimally or near-optimally configured model, thus enabling you to get the best baseline model you possibly can. Summary 100gecs provides custom child classes of LGBMClassifier, LGBMRegressor, CatBoostRegressor and CatBoostClassifier that can be …  ( 10 min )
    [D] Does Entry Level ML exist (in Europe)?
    In your informed opinion, how would an archetypical career in ML look like? Looking at job postings, at least in Europe, it looks like the minimum required experience is around 3 years. There seems to be a good demand for mid-senior level positions, but a void for entry level. As I understand it, most DS departments are not big enough to spare resources for newbies that must be trained, the need is for a few but seasoned engineers. How far is my guess from the truth? And, most importantly, how could new candidates (let's say recent MSc in Data Science, for instance) get into the industry? Through analyst/DE roles? submitted by /u/madway99 [link] [comments]  ( 9 min )
    [D] Any free ai text to speech programs that let me test my own voice models
    Hello everyone I've wanted to test some AI voice models I created with a free AI Text to-speech program Are there any available to test my voice models submitted by /u/mrbeanfan64 [link] [comments]  ( 9 min )
    [D] Cant get TensorFlow or PyTorch to detect my 4090 Laptop GPU
    I dont know what it could be the issue, I have tried everything from reinstalling the pip packages, running them on a virtual env, reinstalled CUDA, trying to run on PyCharm instead of VSCode, I got no idea what to do, does anyone know why it could be happening? Maybe something related to environment variables that I maybe messed up after watching like 5 different vids on the topic? What could it be? submitted by /u/someredditguy374632 [link] [comments]  ( 9 min )
    [D] CodeLlama-xb/CodeLlama-xb-Python vs. CodeLlama-xb-instruct
    Hey guys, so I have googled around and read the documentation but I am still confused between what's the difference between CodeLlama-xb/CodeLlama-xb-Python vs. CodeLlama-xb-instruct? I know the xb model is the base model (for several languages) and the Python model specializes in Python, but what's the instruct model and how is it different from the other 2 models? Would really appreciate your help. Thanks a million! submitted by /u/--leockl-- [link] [comments]  ( 9 min )
    [R] Make use of CPUs on 8 servers?
    I have a blade system with 8 blades. Each blade has 28 cores (e5-2680 v4) and 180gb ram. I would love to run an LLM + Local Files ( kind of like OpenAI does with their api) and run something similar but most “privategpt” LLMs need GPU support and don’t look like they make use of a multi device setup. Anyway to get crunching on CPU on my setup submitted by /u/programmrz [link] [comments]  ( 9 min )
    [Discussion] Seeking Guidance: Transitioning from Trucking to Tech
    Hello everyone! I'm a truck driver with a passion for creating music. While I'm familiar with basic IT tasks from my personal experiences, I'm new to the tech industry and am eager to break in. I've been exploring AI tools like ChatGPT and have become comfortable navigating GitHub. These experiences make me confident about diving into the tech field. After some research, I'm contemplating three domains: Cloud Technology: It seems stable and beginner-friendly. I'm leaning here primarily because I've heard cybersecurity is saturated. Cybersecurity: Interested but cautious due to market saturation. Generative AI/Data: My true passion lies here, but I've gathered that projects matter more than certifications for entry-level roles in this niche, especially for someone without a degree. Given my background and no technical degree, which field would you suggest I pursue? Should I focus on certifications before taking on projects? I genuinely appreciate any insights! (What do you guys think about this “CloudRoad map” is it good advice?) https://www.madebygps.com/cloudcamp/ [Discussion] submitted by /u/motluv_them [link] [comments]  ( 9 min )
  • Open

    AI apps product development canvas – Part 1
    AI Apps are domain-infused, AI/ML-powered applications that continuously learn and adapt with minimal human intervention in helping non-technical users manage data and analytics-intensive operations to deliver well-defined operational outcomes. I originally introduced the idea of a “Data Product Development Canvas” as one of the capstone deliverables (the other being the data science Hypothesis Development Canvas)… Read More »AI apps product development canvas – Part 1 The post AI apps product development canvas – Part 1 appeared first on Data Science Central.  ( 22 min )
  • Open

    What are some of your favorite ai discoveries you've found? What do you think is possible and probable to come in the near future with ai to stay tuned for?
    I've been dabbling around with pi.ai and I love it and feel like it's only going to get better and better at what it does. I'm curious if there is any interesting new ai bots or discoveries that haven't yet made it mainstream but have tons of use in one way or another for the average consumer.. Things like a language learning ai, or an ai that can read through a textbook or pdf and help you learn it's contents by practically interacting with it to help you comprehend better... there's so many interesting ai things I look forward to seeing submitted by /u/mikel0202 [link] [comments]  ( 9 min )
    Is there any AI tool to filter 5 star ratings and just find the ạverage of 1 to 4 star rạtings in Google maps to find the genuine rating of a business ?
    I am tired of fake ratings and fake reviews in Google maps and I hạve been cheated many times by fake 5 star reviews. I am just looking to find the genuine overạll rating of a place by filtering out 5 star ratings because fake ratings are mostly 5 star and just finding the average of 1-4 ratings so that we can find the overall genuine rating of a Business. Is there any AI tool or any way or any application for that ? Need suggestions on this. submitted by /u/ramesh423 [link] [comments]  ( 9 min )
    Is there anyway to mix two faces together and generate a brand new face using AI?
    I'm interested to know if it's possible to mix two faces together, such as Tom Cruise and Brad Pitt, and create a brand new face from those two faces. submitted by /u/Glad-Ad-8953 [link] [comments]  ( 9 min )
    AI generated video
    Cyberpunk montage I made using Midjourney and RunwayML along with CapCut for the transitions, etc. Here’s some brief tutorial steps. Use Midjourney to make pictures that you like. I would keep it to the default aspect ratio for best results. Use RunwayML to generate 4 second videos of the pics. Once you get some videos you like, save them and upload them to the CapCut app on your phone. submitted by /u/Exitium_Maximus [link] [comments]  ( 9 min )
    How valuable is the UK’s AI industry?
    The UK's AI industry is highly valuable, but its exact worth is difficult to determine. The industry is growing rapidly and has the potential to contribute significantly to the country's economy. However, there are challenges in accurately measuring the value of the AI industry, as it encompasses a wide range of sectors and applications. Some estimates suggest that the UK's AI industry could be worth billions of pounds, with the potential to create thousands of jobs. Investment in research and development, as well as the development of AI talent, are crucial for the growth of the industry. Source : https://www.ft.com/content/eeaa57a3-19ed-45d9-8705-2517c81e60ba submitted by /u/NuseAI [link] [comments]  ( 9 min )
    New physics-based self-learning machines could replace current artificial neural networks and save energy | "Neural networks on neuromorphic computers"
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    Top 8 Courses & Certifications on AI Ethics
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 8 min )
    The Accelerators Manifesto - Accelerating AI and our future
    submitted by /u/JulioMedina [link] [comments]  ( 9 min )
  • Open

    Neural Networks vs Tabular Data
    submitted by /u/keghn [link] [comments]  ( 9 min )
    LLM Training: RLHF and Its Alternatives
    submitted by /u/nickb [link] [comments]  ( 9 min )
    (Pt 3) Spatio-Temporal Perception Logic
    submitted by /u/Neurosymbolic [link] [comments]  ( 9 min )
  • Open

    Possible to find 1-day/week contract to help fund RL PhD study?
    Hi, I have been accepted onto a PhD at a top UK university and I'm looking for some additional income by working a day a week on other projects. Does anyone have experience of finding something like that to help with costs during a PhD? I have just completed my MSc with really high marks and have published a paper on conversational agents. I have strong general machine learning and data analysis knowledge, strong knowledge of conversational agents and a specialism in reinforcement learning. Before returning to study, I worked for several years in engineering teams so I know how to get stuff done in a professional context too. The only thing is, I have no idea how to actually find something that could be a day a week and pay a worthwhile day rate to help with crazy cost of living. Thank you for any tips! submitted by /u/EDMismyO2 [link] [comments]  ( 9 min )
    Policy Loss Oscillation
    Is there any insight to be gained about these policy and value loss curves? I hear it’s difficult to judge a good policy by its loss curve, but my agent is hitting a wall in terms of progression over the baseline agents I am evaluating against. For context, my policy is generated by a neural network with the core of it being a dot product of the state embedding against a set of actions embeddings. Any help/understanding would be greatly appreciated. submitted by /u/atomicburn125 [link] [comments]  ( 9 min )
    Seeking Advice on Electricity Trading Problem in Day-Ahead Market
    I'm attempting to address an issue related to electricity trading in the day-ahead market. The objective is to submit 24 bids for each hour. Each bid is represented as a vector of length 10, with 5 values indicating the price and the other 5 indicating the quantity of electricity. For instance, given a sample action vector [w, x, y, z, ...], it operates as follows: if the price is greater than w, buy/sell x units of electricity, if the price is greater than y, buy/sell z units of electricity, and so forth. I possess three years' worth of data, including crucial features like weather conditions, coal prices, wind speed, net load, forecasted load, locational marginal price, and more. Initially, I trained a Deep Q-Network (DQN) to tackle this problem, and it's performing quite well. However, the DQN provides a scalar action for each hour, neglecting price points. It acts regardless of the price. I'm aware that I can explore solutions like Proximal Policy Gradient to generate a vector action that includes both electricity unit amounts and prices. I have three questions: Is it possible to solve this problem using Dynamic Programming techniques? While I understand it's not an exceedingly difficult problem, can I expect some results if I attempt to apply DP? How challenging might it be to output a vector instead of a scalar, with the vector being monotonically increasing? What's the recommended approach for a problem of this nature? Is it worthwhile to explore RL and specifically Proximal Policy Optimization? How would you approach such a problem while keeping it simple and avoiding unnecessary complexity? Any guidance or insights would be greatly appreciated. submitted by /u/uonliaquat [link] [comments]  ( 10 min )
    Reinforcement Learning in Games - Learning the level, not the game
    I've been watching a tonne of reinforcement learning videos on youtube, and was initially very impressed, but as I watched even more, especially by the same youtubers, I started to notice a distinct issue. Their AI aren't learning to play the game, they're learning to play the level. They'll put in 10s or 100s of hours reinforcing the level. It'll play over and over again, selecting for what works, and dropping what doesn't. And over time, the AI will be amazing at that level. But, if you take that reinforced data, and move to level 2 of that game... it'll be practically useless. When humans play a game, say a brand new human who's never played video games before, we'll use reinforcement learning too. Most everything we do is reinforcement learning. Our brain works on reinforcement. …  ( 11 min )
  • Open

    Adversarial Patches and Frames in PyTorch
    Adversarial patches and frames are an alternative to the regular $L_p$-constrained adversarial examples. Often, adversarial patches are thought to be more realistic — mirroring graffitis or stickers in the real world. In this article I want to discuss a simple PyTorch implementation and present some results of adversarial patches against adversarial training as well as confidence-calibrated adversarial training. The post Adversarial Patches and Frames in PyTorch appeared first on David Stutz.  ( 7 min )
  • Open

    Making life friendlier with personal robots
    Sharifa Alghowinem, a research scientist at the Media Lab, explores personal robot technology that explains emotions in English and Arabic.  ( 8 min )
  • Open

    How you define center matters a lot
    Earlier I wrote a post showing what happens when you start with an equilateral triangle, then repeatedly subdivide it into smaller and smaller triangles by drawing lines from the centroid (barycenter) to each of the vertices. I mentioned in that post that I moved the code for finding the center to its own function because […] How you define center matters a lot first appeared on John D. Cook.  ( 5 min )

  • Open

    [Discussion] Anticipatory Customer Support using ML - Your Thoughts?
    Hey everyone, I'm Raynel, working on a CRM platform and actively considering the future integration of AI. One idea I'm particularly excited about is anticipatory customer support. The goal is to proactively address customer needs, perhaps even before they realize them, using ML-driven insights. Has anyone delved into this concept or tried implementing it? I'd love to hear thoughts, potential pitfalls, or even success stories. Thanks in advance for your insights! submitted by /u/bess_point [link] [comments]  ( 9 min )
    [D] What is good replacement for package manager
    I used to build on top of conda, or make docker containers with conda package manager, now that is so broken it is impossible to install old pytorch on a fresh environment. Any way to replace the existing requirements with something better? submitted by /u/AardvarkNo6658 [link] [comments]  ( 9 min )
    [D] Are statistics and ML too "ununified" as fields?
    I thought it would be interesting to open this discussion, I would like to hear what you guys have to say about that. I know that ML folks use statistics all the time, but I am often under the impression that it's mostly the basics. I understand that inference and prediction are inherently different, but I would assume the common knowledge will be much more vast. What is your perspective on that? Am I just missing the point? Is there room for improvement in the future? Do you think statistics literacy (advanced, not ANOVA or so) is common among ML practitioners? submitted by /u/pyepyepie [link] [comments]  ( 9 min )
    [D]what are some generative ai techniques to generate visuals synchronized with music
    I wish to generate visuals that are synced with beats etc of the music to offer the "sensory synchronization" effect where visuals closely sync with the music. I have found Lucid sonic dreams, but it appears to be quite buggy and likely no longer supported. any recommendations for tools I can leverage for a hobby->serious project of generating visuals synced with music. edit: I looked into simple approaches using fft like described here. But I was hoping there are newer generative ai techniques we could leverage. submitted by /u/bluzkluz [link] [comments]  ( 9 min )
    [D] Starting a research lab, any advice on computing infrastructure?
    I'm starting a research lab at a Top 25 research university next year and my research agenda is focused on AI/DL for a scientific domain, I have to come up with a plan and budget for my software/hardware needs. My Context: I have experience setting up linux systems, building computers by myself and training DL models (CNNs/GNNs/LLMs) in a corporate setting. I am venturing to the academic world where resources are more constrained and so I am wondering if there are any guides, tips on setting up a research computer lab that does AI. I do not expect to train from scratch an LLM but maybe finetuning an small LLM. I might also need to do a lot parallelizable IO work to preprocess data. I will talk with the IT department of the university but would like to have some ideas before that conversation. I would appreciate any tips or thoughts, particularly on: * How many GPUs/CPUs to buy? Balancing cost/compute. * Recommended software for managing resources. * Running 100s-1ks of CPU jobs in parallel. * Local compute cluster vs buying compute online. * Hyperparameter optimization and logging of metrics. * Anything else you can think of? submitted by /u/prof_is_training [link] [comments]  ( 9 min )
    [P] GoodWiki Dataset (MIT): Wikipedia Articles in Markdown With Lists, Blockquotes, and More
    Location: https://huggingface.co/datasets/euirim/goodwiki Hi everyone, just wanted to share a dataset I've been working on for use in a personal project! GoodWiki is a 179 million token dataset of English Wikipedia articles collected on September 4, 2023, that have been marked as Good or Featured by Wikipedia editors. The dataset provides these articles in GitHub-flavored Markdown format, preserving layout features like lists, code blocks, math, and block quotes, unlike many other public Wikipedia datasets. Articles are accompanied by a short description of the page as well as any associated categories. Thanks to a careful conversion process from wikicode, the markup language used by Wikipedia, articles in GoodWiki are generally faithful reproductions of the corresponding original Wikipedia pages, minus references, files, infoboxes, and tables. Curated template transclusion and HTML tag handling have minimized instances where entire words and phrases are missing mid-sentence like in other public Wikipedia datasets. GoodWiki is more than 1.5 times larger (when compared using the same tokenizer) than the widely used WikiText-103 dataset by Merity et al., even after excluding article descriptions. Also limited to articles marked as Good or Featured, WikiText inspired GoodWiki. submitted by /u/euirim [link] [comments]  ( 9 min )
    [D]Suffer from a lack of opportunities in ML?
    I wanna learn ML and i love this field there is people told me that the opportunities in this field is few (i live in egypt btw)and in USA also they suffer from a lack of opportunities in this field so is that true and should i choose another field or not because i really wanna learn ML . submitted by /u/Opening-Being-7692 [link] [comments]  ( 9 min )
    [P] Model predicting the same outcome for all entries
    Im currently working on deploying an ML model that predicts NFL MVPs (two possible outputs: 0 for not MVP and 1 for MVP). That being said, during deployment it is currently predicting 0 for all inputs regardless of how varied the inputs may be. However, during the testing phase my model had a varied accuracy rate of 75%-85%. I'm beginning to think that it is very likely that during the testing phase the model was predicting 0 for everything and just getting these higher accuracy rates because of the dominance of non-MVPs within my dataset. This all being said, I'm a noob to ML and decided it'd be best to come on here for help. Is that the likely reason and if so how do I go about fixing it? Furthermore, what other issues could cause this and how would I go about fixing it? submitted by /u/saggyboobsarecooltoo [link] [comments]  ( 9 min )
    [P] I made a website that uses LLMs to help you gain insights about your documents
    submitted by /u/spline_reticulator [link] [comments]  ( 9 min )
    LLM on Blockchain? [D]
    I recently discovered a python library called 'Petals' that should run most LLM models (LLaMA2, Stable Beluga) on some kind of "torrent" for machine learning on your device: https://github.com/bigscience-workshop/petals/, https://www.youtube.com/watch?v=8jEGVaRKmFc Furthermore, if you participate as a "server" in this "torrent" you can be rewarded (according to YouTube) Sounds too good to be true, so I didn't try it yet. Does anyone have experience with it? Can I get a virus from there? Is it not a scam? submitted by /u/Degenerat666 [link] [comments]  ( 9 min )
    [D] Unified Retrieval Augmented Generation - URAG
    PostgresML takes Retrieval Augmented Generation (RAG) a step further, by running the models and vector (or btree) indexes in the same process space that also caches the data, so the retrieval step doesn't require any networking or data (de)serialization. This makes it significantly faster and more reliable than other architectures, which is important for online or interactive applications. For lack of a better term, I'm referring to this as Unified Retrieval Augmented Generation (URAG). The closest I can find to this in the literature is TABR which utilizes recall to improve the performance of LLMs relative to tree based models, although this particular application of Retrieval is extremely computationally expensive for a slight performance improvement, unlike RAG for LLMs that seems to provide significant new capabilities to the model. Is there more research going on for what I'd refer to as URAG systems? We're building open source Python & JS SDKs to make the underlying SQL API more accessible, but I'm wondering if this community, or the academic community has already done more work in this area that we should be aware of. submitted by /u/something_cleverer [link] [comments]  ( 9 min )
    [D]: How does chatbot development look like in practise?
    With all the recent advancements in LLMs, how does chatbot development look like in practise? Suppose I want a chatbot to help with customer service. Can I then just collect some frequent / common questions about items the company might offer for sale and use a pretrained LLM to answer these questions correctly via transfer learning? Is there some effort that goes into explicit knowledge storage? submitted by /u/Blutorangensaft [link] [comments]  ( 9 min )
    [P] AI-Powered Valley Girl creating content autonomously on Twitter
    Hey everyone! I recently started a project where I make AI-powered characters that create content autonomously on social media. The first character that I launched yesterday is your typical valley girl - Alix - who hangs around tech and crypto people. She browses Twitter all day and writes tweets with her opinion of the stuff she comes by. With a unique mood generated for her every day, it's always a gamble to see who lands on her good or bad side. I would really appreciate it if you check her out and let me know what you think! I'm interested in further developing the project and making these synthetic entities equivalent to real people in entertaiment and companionship. Her twitter - https://twitter.com/alix_H2O submitted by /u/GuaranteeAny2894 [link] [comments]  ( 9 min )
    [N] NVIDIA's Groundbreaking TensorRT-LLM Can Double Inference Performance of Language Models
    submitted by /u/norcalnatv [link] [comments]  ( 9 min )
    [P] Is MDP suitable for DQN representation ?
    Problem description For ongoing research I'm defining a decision problem in the form of an MDP. In a simple form, this MDP can be solved via Dynamic Programming. Of course I would like to scale up my MDP by including a sensor simulation, which renders the MDP to such a complexity that my next step is to use a DQN. My challenge is that I'm not sure how to represent my more complex MDP in a fitting DQN. Quite frankly, the more I look into DQN's, the more I wonder if my MDP is correctly defined. It would be great to hear your views and suggestions if you like to share. Definition of the MDP The MDP aims at detecting and chasing away a mosquito. For each distance step (discrete distances although it should be continuous), for instance Si, we cycle through an episode: we detect the mosquito…  ( 10 min )
    [D] Map of the ELLIS units(European Laboratory for Learning and Intelligent Systems). What do the people here think of ELLIS? Anyone who worked with them who can share the experience? Do you think it will help close the gap between Europe and US? (sorry for the bad edit I'm not good at it)
    submitted by /u/Ok_Independent9899 [link] [comments]  ( 9 min )
    [P]Training an image classification model
    Is it normal to achieve a validation accuracy so much higher than the training accuracy? I am using transfer learning to train a convNeXtBaseV1 model on my dataset. I got a training accuracy of 82.9% and a much higher validation accuracy of 97.14%. My dataset is around 9600 medical images and it is balanced between 3 classes. I splitted it into 80% training 10% testing and 10% validation submitted by /u/Different_Hat5643 [link] [comments]  ( 9 min )
    [D] Are there any open source voice cloning models that are capable of cloning other English accents along with the voice?
    Hi. I tried tortoise tts. It is good for cloning how the voice sounds but not the original accent. Are there any open source technologies that can also copy the accent correctly? Please do suggest. Thank you! submitted by /u/salehxoxo [link] [comments]  ( 9 min )
    [P] Vector database integration with PostgreSQL
    Article | Notebook | GitHub There are a rapidly growing number of options in the vector database space. One of the more recent developments is the creation of new vector index plugins for traditional database systems such as pgvector. This is reminiscent of the discussion back in the mid 2010s on whether one should use full text search in the database or sync with an external system such as Elasticsearch. Just as with full text search, it will be tough for vector indexing in the database to compete with the more dedicated solutions past the simple use cases. For example, according the ANN-Benchmarks, the dedicated vector solutions score much higher. The desire to reduce stack complexity and the maturity of systems like Postgres make strong arguments to try to find a way to do it all in the database. The referenced article above proposes a way to integrate existing databases like PostgreSQL with vector indexes such as Faiss, Hnswlib, external vector databases and even keyword indexes like Elasticsearch. This opens up the possibility of combining Postgres features such as fine-grained access control with the performance of a dedicated vector index. ​ submitted by /u/davidmezzetti [link] [comments]  ( 9 min )
    [D] Clustering identical but time-shifted signal together from big database.
    I am working on clustering groups of almost identical (but time shifted) signals together. These clusters will have varying sizes, and I don´t know what the final number of clusters will be. The database consists of up to 100 thousand signals (represented by 4000 long vectors as read from individual text files) which are already quite similar to each other. Ideally I would just make a huge cross-correlation matrix but that is too computationally expensive. So is DTW. What I have done is to apply the fast fourier transform to get it into the frequency domain and therefore get rid of that time shift. Then I would apply a clustering algorithm. I have tried DBSCAN and hierarchical agglomerative clustering which work relatively well but don’t scale well to datasets of this size. Affinity propagation is quick and works okish but I don’t know how to optimise it. Does anyone have any recommendation on which algorithm to use and how to optimise it? Was my idea to apply the fft good? I am not a computer scientist so I am really out of my element. submitted by /u/Bertz-2- [link] [comments]  ( 9 min )
    [P] Hey all! I'm excited to launch GPTCall, a platform that enables real-time voice conversations with Llama 2 and other open-source models! It supports both desktop and mobile browsers. See comments for details.
    submitted by /u/friuns [link] [comments]  ( 9 min )
    [D] Machine Learning Problem: Predictive Maintenance for Industrial Equipment
    I was given following problem in one of the machine learning interviews. I think I messed up there. Need your approach in answering this question. Problem Description: Imagine you are working for a manufacturing company that operates a large fleet of industrial machines. These machines are critical to production, and unexpected breakdowns can result in significant downtime and financial losses. Your task is to develop a predictive maintenance model using machine learning to predict when a machine is likely to fail so that maintenance can be performed just in time to prevent a breakdown. Data: You are provided with historical data for each machine, including sensor readings, maintenance logs, and failure records. The dataset is extensive, containing millions of data points over several ye…  ( 10 min )
    [D] RVC "queue" stuck for over 2 and a half hours?
    Does anyone know what this means if when you import your audio clip and the model you want to use it gets stuck in the queue for over 2 and a half hours? I know that can't be right but I seemed to have followed all the guides correctly not sure what it could be :'( help greatly appreciated submitted by /u/StuntGuy [link] [comments]  ( 9 min )
  • Open

    Belt around an elliptical waist
    I just saw a tweet from Dave Richeson saying I remember as a kid calculating the size difference (diameter) of a belt between each hole. Now I think about it every time I wear a belt. Holes 1 inch apart change the diameter by about one-third of an inch (1/π). [Assuming people have a circular […] Belt around an elliptical waist first appeared on John D. Cook.  ( 6 min )
    Recursive triangle subdivision
    The other day I saw where Cliff Pickover tweeted some images of triangles recursively subdivided by adding a point to the barycenter of each triangle. The images were not what I expected, so I wanted to reproduce the images to look into this further. Here are the first three steps: I set the alpha value […] Recursive triangle subdivision first appeared on John D. Cook.  ( 5 min )
  • Open

    "AI Unleashed: This Week's Top 15 News and Breakthroughs in Artificial Intelligence"
    "AI Unleashed: This Week's Top 15 News and Breakthroughs in Artificial Intelligence" 1"X's Data Revolution: Your Biometrics and Career Fueling the AI of Tomorrow!" In a recent privacy policy shake-up, X has just dropped a bombshell. They're not just interested in your regular data anymore; they're diving deep into the world of biometrics, job histories, and education backgrounds. And that's not all! Another corner of their revamped policy hints at a grand plan: they want to fuse this treasure trove of data with publicly available info to supercharge their machine learning and AI models. This isn't your run-of-the-mill update; it's a quantum leap for X in their quest to build the ultimate AI system. They're not just pushing boundaries; they're smashing through them. By expanding their da…  ( 16 min )
    AI technology behind ChatGPT built was in Iowa – with a lot of water
    The artificial intelligence technology behind ChatGPT was built in Iowa, specifically in the watershed of the Raccoon and Des Moines rivers. Microsoft-backed OpenAI needed a lot of water to cool its powerful supercomputer as it taught its AI systems how to mimic human writing. Leading tech developers like Microsoft, OpenAI, and Google have acknowledged the high costs associated with the growing demand for AI tools, including expensive semiconductors and increased water consumption. Microsoft disclosed a 34% spike in global water consumption, largely attributed to its AI research. A researcher estimates that ChatGPT uses 500 milliliters of water for every series of prompts or questions. Source : https://news.yahoo.com/artificial-intelligence-technology-behind-chatgpt-131421382.html Summarized by Nuse AI submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Teaching LLMs to be more reasonable
    Based on a bit of research and a lot of gut feeling, I offer the following speculation: if you self-trained an LLM with a Python interpreter or Java compiler in a feedback loop where it learned from its own mistakes then it could become dramatically better at coding. It's actually a miracle that they are "decent" at coding despite getting virtually no feedback from an interpreter or compiler. one could train not merely on input and output, but also on an execution trace so the LLM learned HOW the interpreter got the result one could also train the model on how to install and invoke open source software and thus it would learn about a variety of languages, versions and runtimes this might also improve its logical reasoning skills in general Admittedly, running programs is a lot more expensive than doing simple next-word prediction on pre-existing texts. But on the other hand, a corpus of a million program executions can also be used to train future LLMs. You can keep the execution information forever and re-use it as traditional next-token prediction input. submitted by /u/Smallpaul [link] [comments]  ( 9 min )
    Article - "As a writer, I’m afraid of capitalism — not ChatGPT."
    submitted by /u/LaVolpe223 [link] [comments]  ( 9 min )
    100% Free and unlimited...text-to-video AI with optional image reference
    submitted by /u/the_anonymizer [link] [comments]  ( 8 min )
    I made a free tool that allows you to create a personalized AI newsletter containing all of the content you already follow. The app will automatically pull in the top or latest posts from your selected sources so you don't miss anything important.
    submitted by /u/PlayfulPhilosopher42 [link] [comments]  ( 9 min )
    LLM with a voice interface?
    So LLM's like ChatGPT have been around for a while now, and have good APIs, and also voice to text dictation and text to voice generation are close to perfect these days... are there any services that join them all together? I'd like to chat with ChatGPT or similar by talking to my smart speaker a la Google Home, and have it respond by speaking, and to be able to reply and continue the conversation. Does anyone know if this exists yet? submitted by /u/singeblanc [link] [comments]  ( 9 min )
    AI subtitles?
    hey guys, im trying to subtitle a spanish video to have english subtitles, does anyone know a good way to do it for free? submitted by /u/deletemkw [link] [comments]  ( 9 min )
    "The 7 Stages of AI" | AI Uncovered
    submitted by /u/Tao_Dragon [link] [comments]  ( 9 min )
    In AI Regulation Coverage, Media Let Lawmakers Off the Hook
    The media often portrays lawmakers as unable to regulate artificial intelligence (AI) due to its complexity and evolving nature. This narrative overlooks the responsibility of lawmakers and their regulatory inertia. The media frames AI regulation as a matter of technical knowledge rather than moral consideration. The New York Times highlights the slow congressional response to new technologies and the potential influence of lawmakers' financial interests in AI companies. The media fails to question why lawmakers, who have profited from AI, cannot apply their knowledge to regulate it. This lack of critical information in news sources defends the inaction of lawmakers. Source : https://fair.org/home/in-ai-regulation-coverage-media-let-lawmakers-off-the-hook/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    NVIDIA TensorRT-LLM Supercharges Large Language Model Inference on NVIDIA H100 GPUs
    submitted by /u/basitmakine [link] [comments]  ( 9 min )
    One-Minute Daily AI News 9/8/2023
    TIME just picked a list of 100 Most Influential People in AI.[1] AI Startup Imbue Tops $1 Billion Valuation After Funding from Nvidia.[2] Microsoft offers legal protection for AI copyright infringement challenges.[3] US chipmaker Nvidia Corp. on Friday announced separate partnerships with Reliance and Tata group companies to help them develop AI-powered supercomputers, AI clouds and generative AI applications.[4] Sources: [1] https://time.com/6311323/how-we-chose-time100-ai/ [2] https://www.bloomberg.com/news/articles/2023-09-07/ai-startup-imbue-tops-1-billion-valuation-after-funding-from-nvidia?embedded-checkout=true [3] https://arstechnica.com/information-technology/2023/09/microsoft-offers-legal-protection-for-ai-copyright-infringement-challenges/ [4] https://www.livemint.com/companies/news/reliance-tata-cos-sign-ai-partnerships-with-nvidia-11694198851600.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    RVC "queue" is stuck for over 2 and a half hours now?
    Does anyone know what this means if when you import your audio clip and the model you want to use it gets stuck in the queue for over 2 and a half hours? I know that can't be right but I seemed to have followed all the guides correctly not sure what it could be :'( help greatly appreciated submitted by /u/StuntGuy [link] [comments]  ( 9 min )
    ChatGPT hype is fading ! Traffic drops from 1.6 bn to 1.4 bn users in 3 months
    submitted by /u/Agitated-Spell3979 [link] [comments]  ( 9 min )
    Paige partners with Microsoft to construct world's largest AI model for battling cancer
    Healthcare technology disruptor Paige is teaming up with Microsoft in the race against cancer. The collaboration aims to revolutionize cancer diagnosis and patient care by building the largest image-based artificial intelligence model for digital pathology and oncology. To stay one step ahead in AI transformations, subscribe here. Transforming cancer imaging Using Microsoft’s advanced supercomputing infrastructure, Paige aims to take cancer imaging to the next level. By combining its deep AI expertise with Microsoft’s enormous computing power, this model augments accuracy and brings in novel capabilities in cancer diagnostics. In the next phase, Paige will incorporate up to four million digitized microscopy slides from its petabyte-scale clinical data archive. A milestone in oncology The Paige and Microsoft partnership is a game changer in advancing healthcare, equipping physicians with unprecedented insights into cancer pathology. Paige is the first company to receive FDA approval for a clinical AI application in digital pathology. The technology is set to increase diagnostic confidence, productivity, and expand treatment options for patients worldwide. (source) P.S. If you love these analyses, I write a free newsletter to track the most significant news and research in AI and tech. Experts from Google, Meta, OpenAI, and more read it daily. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    The latest Tesla self-driving car iteration is a behavior-cloning NN
    submitted by /u/gwern [link] [comments]  ( 9 min )
    Multi Agent RL Project Ideas/Implementation
    I'm looking for some ideas on Multi Agent RL that preferably involve Robotics. I've came up with two ideas based on essentially similar themes: 1) Multiple robots tasked with cleaning a large room (with obstacles) 2) Multiple robots tasked with a search and rescue like mission in a particular area. Both are basically applications of n agents trying to collectively cover a region. Can someone recommend some frameworks and libraries that can allow me to simulate these ideas? Also, I'd love to hear some other ideas as well which use multi-agent RL for robotic applications. For now I'm only targeting a simulation based project. If I get time later I'd love to implement them on hardware as well. Thanks in advance! submitted by /u/esem29 [link] [comments]  ( 9 min )
    New Textbook "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches"
    New introduction textbook titled "Multi-Agent Reinforcement Learning: Foundations and Modern Approaches" by Stefano V. Albrecht, Filippos Christianos, Lukas Schäfer, to be published by MIT Press. The book draft can be downloaded here: https://www.marl-book.com/ ​ submitted by /u/vuttigiquoje-4292 [link] [comments]  ( 9 min )
    A simple analysis of why IPPO performs better than MAPPO in MARL
    To review IPPO vs. MAPPO, first, let's revisit the paper "Is Independent Learning All You Need in the StarCraft Multi-Agent Challenge?". ( https://arxiv.org/abs/2011.09533 ) The paper shows that simply applying PPO to the SMAC task to construct an IPPO (independent PPO) algorithm like IQL can surpass QMIX on SMAC. IPPO demonstrates the effectiveness of applying PPO to multi-agent systems. The paper further extends IPPO to MAPPO. The difference is that the critic of PPO uses the global state instead of the observation as input. Surprisingly, the global information does not enhance the actual performance of IPPO. ​ https://preview.redd.it/1wqqlj9z05nb1.png?width=1440&format=png&auto=webp&s=0d6ca1faa0e872151abdb6bb1e48884c6b51e71a ​ https://preview.redd.it/ei9uacl015nb1.png?width=1440&fo…  ( 10 min )
  • Open

    DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection. (arXiv:2309.03893v1 [cs.CV])
    Data is the cornerstone of deep learning. This paper reveals that the recently developed Diffusion Model is a scalable data engine for object detection. Existing methods for scaling up detection-oriented data often require manual collection or generative models to obtain target images, followed by data augmentation and labeling to produce training pairs, which are costly, complex, or lacking diversity. To address these issues, we presentDiffusionEngine (DE), a data scaling-up engine that provides high-quality detection-oriented training pairs in a single stage. DE consists of a pre-trained diffusion model and an effective Detection-Adapter, contributing to generating scalable, diverse and generalizable detection data in a plug-and-play manner. Detection-Adapter is learned to align the implicit semantic and location knowledge in off-the-shelf diffusion models with detection-aware signals to make better bounding-box predictions. Additionally, we contribute two datasets, i.e., COCO-DE and VOC-DE, to scale up existing detection benchmarks for facilitating follow-up research. Extensive experiments demonstrate that data scaling-up via DE can achieve significant improvements in diverse scenarios, such as various detection algorithms, self-supervised pre-training, data-sparse, label-scarce, cross-domain, and semi-supervised learning. For example, when using DE with a DINO-based adapter to scale up data, mAP is improved by 3.1% on COCO, 7.6% on VOC, and 11.5% on Clipart.  ( 2 min )
    M3FGM:a node masking and multi-granularity message passing-based federated graph model for spatial-temporal data prediction. (arXiv:2210.16193v3 [cs.LG] UPDATED)
    Researchers are solving the challenges of spatial-temporal prediction by combining Federated Learning (FL) and graph models with respect to the constrain of privacy and security. In order to make better use of the power of graph model, some researchs also combine split learning(SL). However, there are still several issues left unattended: 1) Clients might not be able to access the server during inference phase; 2) The graph of clients designed manually in the server model may not reveal the proper relationship between clients. This paper proposes a new GNN-oriented split federated learning method, named node {\bfseries M}asking and {\bfseries M}ulti-granularity {\bfseries M}essage passing-based Federated Graph Model (M$^3$FGM) for the above issues. For the first issue, the server model of M$^3$FGM employs a MaskNode layer to simulate the case of clients being offline. We also redesign the decoder of the client model using a dual-sub-decoders structure so that each client model can use its local data to predict independently when offline. As for the second issue, a new GNN layer named Multi-Granularity Message Passing (MGMP) layer enables each client node to perceive global and local information. We conducted extensive experiments in two different scenarios on two real traffic datasets. Results show that M$^3$FGM outperforms the baselines and variant models, achieves the best results in both datasets and scenarios.  ( 3 min )
    Mixup-Augmented Meta-Learning for Sample-Efficient Fine-Tuning of Protein Simulators. (arXiv:2308.15116v2 [cs.LG] UPDATED)
    Molecular dynamics simulations have emerged as a fundamental instrument for studying biomolecules. At the same time, it is desirable to perform simulations of a collection of particles under various conditions in which the molecules can fluctuate. In this paper, we explore and adapt the soft prompt-based learning method to molecular dynamics tasks. Our model can remarkably generalize to unseen and out-of-distribution scenarios with limited training data. While our work focuses on temperature as a test case, the versatility of our approach allows for efficient simulation through any continuous dynamic conditions, such as pressure and volumes. Our framework has two stages: 1) Pre-trains with data mixing technique, augments molecular structure data and temperature prompts, then applies a curriculum learning method by increasing the ratio of them smoothly. 2) Meta-learning-based fine-tuning framework improves sample-efficiency of fine-tuning process and gives the soft prompt-tuning better initialization points. Comprehensive experiments reveal that our framework excels in accuracy for in-domain data and demonstrates strong generalization capabilities for unseen and out-of-distribution samples.  ( 2 min )
    Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem. (arXiv:2002.01444v5 [math.OC] UPDATED)
    There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.  ( 2 min )
    Blink: Link Local Differential Privacy in Graph Neural Networks via Bayesian Estimation. (arXiv:2309.03190v2 [cs.LG] UPDATED)
    Graph neural networks (GNNs) have gained an increasing amount of popularity due to their superior capability in learning node embeddings for various graph inference tasks, but training them can raise privacy concerns. To address this, we propose using link local differential privacy over decentralized nodes, enabling collaboration with an untrusted server to train GNNs without revealing the existence of any link. Our approach spends the privacy budget separately on links and degrees of the graph for the server to better denoise the graph topology using Bayesian estimation, alleviating the negative impact of LDP on the accuracy of the trained GNNs. We bound the mean absolute error of the inferred link probabilities against the ground truth graph topology. We then propose two variants of our LDP mechanism complementing each other in different privacy settings, one of which estimates fewer links under lower privacy budgets to avoid false positive link estimates when the uncertainty is high, while the other utilizes more information and performs better given relatively higher privacy budgets. Furthermore, we propose a hybrid variant that combines both strategies and is able to perform better across different privacy budgets. Extensive experiments show that our approach outperforms existing methods in terms of accuracy under varying privacy budgets.  ( 3 min )
    AtmoRep: A stochastic model of atmosphere dynamics using large scale representation learning. (arXiv:2308.13280v2 [physics.ao-ph] UPDATED)
    The atmosphere affects humans in a multitude of ways, from loss of life due to adverse weather effects to long-term social and economic impacts on societies. Computer simulations of atmospheric dynamics are, therefore, of great importance for the well-being of our and future generations. Here, we propose AtmoRep, a novel, task-independent stochastic computer model of atmospheric dynamics that can provide skillful results for a wide range of applications. AtmoRep uses large-scale representation learning from artificial intelligence to determine a general description of the highly complex, stochastic dynamics of the atmosphere from the best available estimate of the system's historical trajectory as constrained by observations. This is enabled by a novel self-supervised learning objective and a unique ensemble that samples from the stochastic model with a variability informed by the one in the historical record. The task-independent nature of AtmoRep enables skillful results for a diverse set of applications without specifically training for them and we demonstrate this for nowcasting, temporal interpolation, model correction, and counterfactuals. We also show that AtmoRep can be improved with additional data, for example radar observations, and that it can be extended to tasks such as downscaling. Our work establishes that large-scale neural networks can provide skillful, task-independent models of atmospheric dynamics. With this, they provide a novel means to make the large record of atmospheric observations accessible for applications and for scientific inquiry, complementing existing simulations based on first principles.  ( 3 min )
    Privacy-preserving Continual Federated Clustering via Adaptive Resonance Theory. (arXiv:2309.03487v1 [cs.LG])
    With the increasing importance of data privacy protection, various privacy-preserving machine learning methods have been proposed. In the clustering domain, various algorithms with a federated learning framework (i.e., federated clustering) have been actively studied and showed high clustering performance while preserving data privacy. However, most of the base clusterers (i.e., clustering algorithms) used in existing federated clustering algorithms need to specify the number of clusters in advance. These algorithms, therefore, are unable to deal with data whose distributions are unknown or continually changing. To tackle this problem, this paper proposes a privacy-preserving continual federated clustering algorithm. In the proposed algorithm, an adaptive resonance theory-based clustering algorithm capable of continual learning is used as a base clusterer. Therefore, the proposed algorithm inherits the ability of continual learning. Experimental results with synthetic and real-world datasets show that the proposed algorithm has superior clustering performance to state-of-the-art federated clustering algorithms while realizing data privacy protection and continual learning ability. The source code is available at \url{https://github.com/Masuyama-lab/FCAC}.  ( 2 min )
    Explanation Shift: How Did the Distribution Shift Impact the Model?. (arXiv:2303.08081v2 [cs.LG] UPDATED)
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.  ( 2 min )
    Revisiting Hidden Representations in Transfer Learning for Medical Imaging. (arXiv:2302.08272v2 [cs.CV] UPDATED)
    While a key component to the success of deep learning is the availability of massive amounts of training data, medical image datasets are often limited in diversity and size. Transfer learning has the potential to bridge the gap between related yet different domains. For medical applications, however, it remains unclear whether it is more beneficial to pre-train on natural or medical images. We aim to shed light on this problem by comparing initialization on ImageNet and RadImageNet on seven medical classification tasks. Our work includes a replication study, which yields results contrary to previously published findings. In our experiments, ResNet50 models pre-trained on ImageNet tend to outperform those trained on RadImageNet. To gain further insights, we investigate the learned representations using Canonical Correlation Analysis (CCA) and compare the predictions of the different models. Our results indicate that, contrary to intuition, ImageNet and RadImageNet may converge to distinct intermediate representations, which appear to diverge further during fine-tuning. Despite these distinct representations, the predictions of the models remain similar. Our findings show that the similarity between networks before and after fine-tuning does not correlate with performance gains, suggesting that the advantages of transfer learning might not solely originate from the reuse of features in the early layers of a convolutional neural network.
    A Tutorial on the Non-Asymptotic Theory of System Identification. (arXiv:2309.03873v1 [eess.SY])
    This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
    Medoid Silhouette clustering with automatic cluster number selection. (arXiv:2309.03751v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
    Knowledge Distillation Layer that Lets the Student Decide. (arXiv:2309.02843v1 [cs.CV] CROSS LISTED)
    Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework
    On the dynamics of multi agent nonlinear filtering and learning. (arXiv:2309.03557v1 [stat.ML])
    Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.
    Enhancing Pipeline-Based Conversational Agents with Large Language Models. (arXiv:2309.03748v1 [cs.CL])
    The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.
    How to select an objective function using information theory. (arXiv:2212.06566v2 [cs.LG] UPDATED)
    In machine learning or scientific computing, model performance is measured with an objective function. But why choose one objective over another? Information theory gives one answer: To maximize the information in the model, select the most likely objective function or whichever represents the error in the fewest bits. To evaluate different objectives, transform them into likelihood functions. As likelihoods, their relative magnitudes represent how much we should prefer one objective versus another, and the log of their magnitude represents the expected uncertainty of the model.
    Equal Long-term Benefit Rate: Adapting Static Fairness Notions to Sequential Decision Making. (arXiv:2309.03426v1 [cs.LG])
    Decisions made by machine learning models may have lasting impacts over time, making long-term fairness a crucial consideration. It has been shown that when ignoring the long-term effect, naively imposing fairness criterion in static settings can actually exacerbate bias over time. To explicitly address biases in sequential decision-making, recent works formulate long-term fairness notions in Markov Decision Process (MDP) framework. They define the long-term bias to be the sum of static bias over each time step. However, we demonstrate that naively summing up the step-wise bias can cause a false sense of fairness since it fails to consider the importance difference of different time steps during transition. In this work, we introduce a long-term fairness notion called Equal Long-term Benefit Rate (ELBERT), which explicitly considers varying temporal importance and adapts static fairness principles to the sequential setting. Moreover, we show that the policy gradient of Long-term Benefit Rate can be analytically reduced to standard policy gradient. This makes standard policy optimization methods applicable for reducing the bias, leading to our proposed bias mitigation method ELBERT-PO. Experiments on three sequential decision making environments show that ELBERT-PO significantly reduces bias and maintains high utility. Code is available at https://github.com/Yuancheng-Xu/ELBERT.
    Training Acceleration of Low-Rank Decomposed Networks using Sequential Freezing and Rank Quantization. (arXiv:2309.03824v1 [cs.LG])
    Low Rank Decomposition (LRD) is a model compression technique applied to the weight tensors of deep learning models in order to reduce the number of trainable parameters and computational complexity. However, due to high number of new layers added to the architecture after applying LRD, it may not lead to a high training/inference acceleration if the decomposition ranks are not small enough. The issue is that using small ranks increases the risk of significant accuracy drop after decomposition. In this paper, we propose two techniques for accelerating low rank decomposed models without requiring to use small ranks for decomposition. These methods include rank optimization and sequential freezing of decomposed layers. We perform experiments on both convolutional and transformer-based models. Experiments show that these techniques can improve the model throughput up to 60% during training and 37% during inference when combined together while preserving the accuracy close to that of the original models
    Achieving Occam's Razor: Deep Learning for Optimal Model Reduction. (arXiv:2303.13746v2 [cs.LG] UPDATED)
    All fields of science depend on mathematical models. Occam's razor refers to the principle that good models should exclude parameters beyond those minimally required to describe the systems they represent. This is because redundancy can lead to incorrect estimates of model parameters from data, and thus inaccurate or ambiguous conclusions. Here, we show how deep learning can be powerfully leveraged to address Occam's razor. FixFit, our new method, uses a feedforward deep neural network with a bottleneck layer to characterize and predict the behavior of a given model from its input parameters. FixFit has three major benefits. First, it provides a metric to quantify the original model's degree of complexity. Second, it allows for the unique fitting of data. Third, it provides an unbiased way to discriminate between experimental hypotheses that add value versus those that do not. In two use cases, we demonstrate the broad applicability of this method across scientific domains. To validate the method using a known system, we apply FixFit to recover known composite parameters for the Kepler orbit model. To illustrate how the method can be applied to less well-established fields, we use it to identify parameters for a multi-scale brain model and reduce the search space for viable candidate mechanisms.
    Deep Learning Safety Concerns in Automated Driving Perception. (arXiv:2309.03774v1 [cs.LG])
    Recent advances in the field of deep learning and impressive performance of deep neural networks (DNNs) for perception have resulted in an increased demand for their use in automated driving (AD) systems. The safety of such systems is of utmost importance and thus requires to consider the unique properties of DNNs. In order to achieve safety of AD systems with DNN-based perception components in a systematic and comprehensive approach, so-called safety concerns have been introduced as a suitable structuring element. On the one hand, the concept of safety concerns is -- by design -- well aligned to existing standards relevant for safety of AD systems such as ISO 21448 (SOTIF). On the other hand, it has already inspired several academic publications and upcoming standards on AI safety such as ISO PAS 8800. While the concept of safety concerns has been previously introduced, this paper extends and refines it, leveraging feedback from various domain and safety experts in the field. In particular, this paper introduces an additional categorization for a better understanding as well as enabling cross-functional teams to jointly address the concerns.
    Evaluating the Efficacy of Supervised Learning vs Large Language Models for Identifying Cognitive Distortions and Suicidal Risks in Chinese Social Media. (arXiv:2309.03564v1 [cs.CL])
    Large language models, particularly those akin to the rapidly progressing GPT series, are gaining traction for their expansive influence. While there is keen interest in their applicability within medical domains such as psychology, tangible explorations on real-world data remain scant. Concurrently, users on social media platforms are increasingly vocalizing personal sentiments; under specific thematic umbrellas, these sentiments often manifest as negative emotions, sometimes escalating to suicidal inclinations. Timely discernment of such cognitive distortions and suicidal risks is crucial to effectively intervene and potentially avert dire circumstances. Our study ventured into this realm by experimenting on two pivotal tasks: suicidal risk and cognitive distortion identification on Chinese social media platforms. Using supervised learning as a baseline, we examined and contrasted the efficacy of large language models via three distinct strategies: zero-shot, few-shot, and fine-tuning. Our findings revealed a discernible performance gap between the large language models and traditional supervised learning approaches, primarily attributed to the models' inability to fully grasp subtle categories. Notably, while GPT-4 outperforms its counterparts in multiple scenarios, GPT-3.5 shows significant enhancement in suicide risk classification after fine-tuning. To our knowledge, this investigation stands as the maiden attempt at gauging large language models on Chinese social media tasks. This study underscores the forward-looking and transformative implications of using large language models in the field of psychology. It lays the groundwork for future applications in psychological research and practice.
    Punctate White Matter Lesion Segmentation in Preterm Infants Powered by Counterfactually Generative Learning. (arXiv:2309.03440v1 [eess.IV])
    Accurate segmentation of punctate white matter lesions (PWMLs) are fundamental for the timely diagnosis and treatment of related developmental disorders. Automated PWMLs segmentation from infant brain MR images is challenging, considering that the lesions are typically small and low-contrast, and the number of lesions may dramatically change across subjects. Existing learning-based methods directly apply general network architectures to this challenging task, which may fail to capture detailed positional information of PWMLs, potentially leading to severe under-segmentations. In this paper, we propose to leverage the idea of counterfactual reasoning coupled with the auxiliary task of brain tissue segmentation to learn fine-grained positional and morphological representations of PWMLs for accurate localization and segmentation. A simple and easy-to-implement deep-learning framework (i.e., DeepPWML) is accordingly designed. It combines the lesion counterfactual map with the tissue probability map to train a lightweight PWML segmentation network, demonstrating state-of-the-art performance on a real-clinical dataset of infant T1w MR images. The code is available at \href{https://github.com/ladderlab-xjtu/DeepPWML}{https://github.com/ladderlab-xjtu/DeepPWML}.
    M(otion)-mode Based Prediction of Ejection Fraction using Echocardiograms. (arXiv:2309.03759v1 [eess.IV])
    Early detection of cardiac dysfunction through routine screening is vital for diagnosing cardiovascular diseases. An important metric of cardiac function is the left ventricular ejection fraction (EF), where lower EF is associated with cardiomyopathy. Echocardiography is a popular diagnostic tool in cardiology, with ultrasound being a low-cost, real-time, and non-ionizing technology. However, human assessment of echocardiograms for calculating EF is time-consuming and expertise-demanding, raising the need for an automated approach. In this work, we propose using the M(otion)-mode of echocardiograms for estimating the EF and classifying cardiomyopathy. We generate multiple artificial M-mode images from a single echocardiogram and combine them using off-the-shelf model architectures. Additionally, we extend contrastive learning (CL) to cardiac imaging to learn meaningful representations from exploiting structures in unlabeled data allowing the model to achieve high accuracy, even with limited annotations. Our experiments show that the supervised setting converges with only ten modes and is comparable to the baseline method while bypassing its cumbersome training process and being computationally much more efficient. Furthermore, CL using M-mode images is helpful for limited data scenarios, such as having labels for only 200 patients, which is common in medical applications.
    Conformal Autoregressive Generation: Beam Search with Coverage Guarantees. (arXiv:2309.03797v1 [cs.LG])
    We introduce two new extensions to the beam search algorithm based on conformal predictions (CP) to produce sets of sequences with theoretical coverage guarantees. The first method is very simple and proposes dynamically-sized subsets of beam search results but, unlike typical CP procedures, has an upper bound on the achievable guarantee depending on a post-hoc calibration measure. Our second algorithm introduces the conformal set prediction procedure as part of the decoding process, producing a variable beam width which adapts to the current uncertainty. While more complex, this procedure can achieve coverage guarantees selected a priori. We provide marginal coverage bounds for each method, and evaluate them empirically on a selection of tasks drawing from natural language processing and chemistry.
    REBOOT: Reuse Data for Bootstrapping Efficient Real-World Dexterous Manipulation. (arXiv:2309.03322v1 [cs.LG])
    Dexterous manipulation tasks involving contact-rich interactions pose a significant challenge for both model-based control systems and imitation learning algorithms. The complexity arises from the need for multi-fingered robotic hands to dynamically establish and break contacts, balance non-prehensile forces, and control large degrees of freedom. Reinforcement learning (RL) offers a promising approach due to its general applicability and capacity to autonomously acquire optimal manipulation strategies. However, its real-world application is often hindered by the necessity to generate a large number of samples, reset the environment, and obtain reward signals. In this work, we introduce an efficient system for learning dexterous manipulation skills with RL to alleviate these challenges. The main idea of our approach is the integration of recent advances in sample-efficient RL and replay buffer bootstrapping. This combination allows us to utilize data from different tasks or objects as a starting point for training new tasks, significantly improving learning efficiency. Additionally, our system completes the real-world training cycle by incorporating learned resets via an imitation-based pickup policy as well as learned reward functions, eliminating the need for manual resets and reward engineering. We demonstrate the benefits of reusing past data as replay buffer initialization for new tasks, for instance, the fast acquisition of intricate manipulation skills in the real world on a four-fingered robotic hand. (Videos: https://sites.google.com/view/reboot-dexterous)
    Natural Example-Based Explainability: a Survey. (arXiv:2309.03234v1 [cs.AI])
    Explainable Artificial Intelligence (XAI) has become increasingly significant for improving the interpretability and trustworthiness of machine learning models. While saliency maps have stolen the show for the last few years in the XAI field, their ability to reflect models' internal processes has been questioned. Although less in the spotlight, example-based XAI methods have continued to improve. It encompasses methods that use examples as explanations for a machine learning model's predictions. This aligns with the psychological mechanisms of human reasoning and makes example-based explanations natural and intuitive for users to understand. Indeed, humans learn and reason by forming mental representations of concepts based on examples. This paper provides an overview of the state-of-the-art in natural example-based XAI, describing the pros and cons of each approach. A "natural" example simply means that it is directly drawn from the training data without involving any generative process. The exclusion of methods that require generating examples is justified by the need for plausibility which is in some regards required to gain a user's trust. Consequently, this paper will explore the following family of methods: similar examples, counterfactual and semi-factual, influential instances, prototypes, and concepts. In particular, it will compare their semantic definition, their cognitive impact, and added values. We hope it will encourage and facilitate future work on natural example-based XAI.
    The Space of Adversarial Strategies. (arXiv:2209.04521v2 [cs.CR] UPDATED)
    Adversarial examples, inputs designed to induce worst-case behavior in machine learning models, have been extensively studied over the past decade. Yet, our understanding of this phenomenon stems from a rather fragmented pool of knowledge; at present, there are a handful of attacks, each with disparate assumptions in threat models and incomparable definitions of optimality. In this paper, we propose a systematic approach to characterize worst-case (i.e., optimal) adversaries. We first introduce an extensible decomposition of attacks in adversarial machine learning by atomizing attack components into surfaces and travelers. With our decomposition, we enumerate over components to create 576 attacks (568 of which were previously unexplored). Next, we propose the Pareto Ensemble Attack (PEA): a theoretical attack that upper-bounds attack performance. With our new attacks, we measure performance relative to the PEA on: both robust and non-robust models, seven datasets, and three extended lp-based threat models incorporating compute costs, formalizing the Space of Adversarial Strategies. From our evaluation we find that attack performance to be highly contextual: the domain, model robustness, and threat model can have a profound influence on attack efficacy. Our investigation suggests that future studies measuring the security of machine learning should: (1) be contextualized to the domain & threat models, and (2) go beyond the handful of known attacks used today.
    Spatio-Temporal Contrastive Self-Supervised Learning for POI-level Crowd Flow Inference. (arXiv:2309.03239v1 [cs.LG])
    Accurate acquisition of crowd flow at Points of Interest (POIs) is pivotal for effective traffic management, public service, and urban planning. Despite this importance, due to the limitations of urban sensing techniques, the data quality from most sources is inadequate for monitoring crowd flow at each POI. This renders the inference of accurate crowd flow from low-quality data a critical and challenging task. The complexity is heightened by three key factors: 1) \emph{The scarcity and rarity of labeled data}, 2) \emph{The intricate spatio-temporal dependencies among POIs}, and 3) \emph{The myriad correlations between precise crowd flow and GPS reports}. To address these challenges, we recast the crowd flow inference problem as a self-supervised attributed graph representation learning task and introduce a novel \underline{C}ontrastive \underline{S}elf-learning framework for \underline{S}patio-\underline{T}emporal data (\model). Our approach initiates with the construction of a spatial adjacency graph founded on the POIs and their respective distances. We then employ a contrastive learning technique to exploit large volumes of unlabeled spatio-temporal data. We adopt a swapped prediction approach to anticipate the representation of the target subgraph from similar instances. Following the pre-training phase, the model is fine-tuned with accurate crowd flow data. Our experiments, conducted on two real-world datasets, demonstrate that the \model pre-trained on extensive noisy data consistently outperforms models trained from scratch.
    Let Quantum Neural Networks Choose Their Own Frequencies. (arXiv:2309.03279v1 [quant-ph])
    Parameterized quantum circuits as machine learning models are typically well described by their representation as a partial Fourier series of the input features, with frequencies uniquely determined by the feature map's generator Hamiltonians. Ordinarily, these data-encoding generators are chosen in advance, fixing the space of functions that can be represented. In this work we consider a generalization of quantum models to include a set of trainable parameters in the generator, leading to a trainable frequency (TF) quantum model. We numerically demonstrate how TF models can learn generators with desirable properties for solving the task at hand, including non-regularly spaced frequencies in their spectra and flexible spectral richness. Finally, we showcase the real-world effectiveness of our approach, demonstrating an improved accuracy in solving the Navier-Stokes equations using a TF model with only a single parameter added to each encoding operation. Since TF models encompass conventional fixed frequency models, they may offer a sensible default choice for variational quantum machine learning.
    A comparison of rational and neural network based approximations. (arXiv:2303.04436v2 [math.OC] UPDATED)
    Rational and neural network based approximations are efficient tools in modern approximation. These approaches are able to produce accurate approximations to nonsmooth and non-Lipschitz functions, including multivariate domain functions. In this paper we compare the efficiency of function approximation using rational approximation, neural network and their combinations. It was found that rational approximation is superior to neural network based approaches with the same number of decision variables. Our numerical experiments demonstrate the efficiency of rational approximation, even when the number of approximation parameters (that is, the dimension of the corresponding optimisation problems) is small. Another important contribution of this paper lies in the improvement of rational approximation algorithms. Namely, the optimisation based algorithms for rational approximation can be adjusted to in such a way that the conditioning number of the constraint matrices are controlled. This simple adjustment enables us to work with high dimension optimisation problems and improve the design of the neural network. The main strength of neural networks is in their ability to handle models with a large number of variables: complex models are decomposed in several simple optimisation problems. Therefore the the large number of decision variables is in the nature of neural networks.
    Fitness Approximation through Machine Learning. (arXiv:2309.03318v1 [cs.NE])
    We present a novel approach to performing fitness approximation in genetic algorithms (GAs) using machine-learning (ML) models, focusing on evolutionary agents in Gymnasium (game) simulators -- where fitness computation is costly. Maintaining a dataset of sampled individuals along with their actual fitness scores, we continually update throughout an evolutionary run a fitness-approximation ML model. We compare different methods for: 1) switching between actual and approximate fitness, 2) sampling the population, and 3) weighting the samples. Experimental findings demonstrate significant improvement in evolutionary runtimes, with fitness scores that are either identical or slightly lower than that of the fully run GA -- depending on the ratio of approximate-to-actual-fitness computation. Our approach is generic and can be easily applied to many different domains.
    Dynamic Causal Graph Convolutional Network for Traffic Prediction. (arXiv:2306.07019v2 [cs.LG] UPDATED)
    Modeling complex spatiotemporal dependencies in correlated traffic series is essential for traffic prediction. While recent works have shown improved prediction performance by using neural networks to extract spatiotemporal correlations, their effectiveness depends on the quality of the graph structures used to represent the spatial topology of the traffic network. In this work, we propose a novel approach for traffic prediction that embeds time-varying dynamic Bayesian network to capture the fine spatiotemporal topology of traffic data. We then use graph convolutional networks to generate traffic forecasts. To enable our method to efficiently model nonlinear traffic propagation patterns, we develop a deep learning-based module as a hyper-network to generate stepwise dynamic causal graphs. Our experimental results on a real traffic dataset demonstrate the superior prediction performance of the proposed method. The code is available at https://github.com/MonBG/DCGCN.
    Acoustic-to-articulatory inversion for dysarthric speech: Are pre-trained self-supervised representations favorable?. (arXiv:2309.01108v2 [eess.AS] UPDATED)
    $ $Acoustic-to-articulatory inversion (AAI) involves mapping from the acoustic space to the articulatory space. Signal-processing features like the MFCCs, have been widely used for the AAI task. For subjects with dysarthric speech, AAI is challenging because of an imprecise and indistinct pronunciation. In this work, we perform AAI for dysarthric speech using representations from pre-trained self-supervised learning (SSL) models. We demonstrate the impact of different pre-trained features on this challenging AAI task, at low-resource conditions. In addition, we also condition x-vectors to the extracted SSL features to train a BLSTM network. In the seen case, we experiment with three AAI training schemes (subject-specific, pooled, and fine-tuned). The results, consistent across training schemes, reveal that DeCoAR, in the fine-tuned scheme, achieves a relative improvement of the Pearson Correlation Coefficient (CC) by ${\sim}$1.81\% and ${\sim}$4.56\% for healthy controls and patients, respectively, over MFCCs. In the unseen case, we observe similar average trends for different SSL features. Overall, SSL networks like wav2vec, APC, and DeCoAR, which are trained with feature reconstruction or future timestep prediction tasks, perform well in predicting dysarthric articulatory trajectories.
    Evaluating Explanation Methods for Multivariate Time Series Classification. (arXiv:2308.15223v2 [cs.LG] UPDATED)
    Multivariate time series classification is an important computational task arising in applications where data is recorded over time and over multiple channels. For example, a smartwatch can record the acceleration and orientation of a person's motion, and these signals are recorded as multivariate time series. We can classify this data to understand and predict human movement and various properties such as fitness levels. In many applications classification alone is not enough, we often need to classify but also understand what the model learns (e.g., why was a prediction given, based on what information in the data). The main focus of this paper is on analysing and evaluating explanation methods tailored to Multivariate Time Series Classification (MTSC). We focus on saliency-based explanation methods that can point out the most relevant channels and time series points for the classification decision. We analyse two popular and accurate multivariate time series classifiers, ROCKET and dResNet, as well as two popular explanation methods, SHAP and dCAM. We study these methods on 3 synthetic datasets and 2 real-world datasets and provide a quantitative and qualitative analysis of the explanations provided. We find that flattening the multivariate datasets by concatenating the channels works as well as using multivariate classifiers directly and adaptations of SHAP for MTSC work quite well. Additionally, we also find that the popular synthetic datasets we used are not suitable for time series analysis.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v4 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    RatGPT: Turning online LLMs into Proxies for Malware Attacks. (arXiv:2308.09183v2 [cs.CR] UPDATED)
    The evolution of Generative AI and the capabilities of the newly released Large Language Models (LLMs) open new opportunities in software engineering. However, they also lead to new challenges in cybersecurity. Recently, researchers have shown the possibilities of using LLMs such as ChatGPT to generate malicious content that can directly be exploited or guide inexperienced hackers to weaponize tools and code. These studies covered scenarios that still require the attacker to be in the middle of the loop. In this study, we leverage openly available plugins and use an LLM as proxy between the attacker and the victim. We deliver a proof-of-concept where ChatGPT is used for the dissemination of malicious software while evading detection, alongside establishing the communication to a command and control (C2) server to receive commands to interact with a victim's system. Finally, we present the general approach as well as essential elements in order to stay undetected and make the attack a success. This proof-of-concept highlights significant cybersecurity issues with openly available plugins and LLMs, which require the development of security guidelines, controls, and mitigation strategies.
    Short-Term Load Forecasting Using A Particle-Swarm Optimized Multi-Head Attention-Augmented CNN-LSTM Network. (arXiv:2309.03694v1 [cs.LG])
    Short-term load forecasting is of paramount importance in the efficient operation and planning of power systems, given its inherent non-linear and dynamic nature. Recent strides in deep learning have shown promise in addressing this challenge. However, these methods often grapple with hyperparameter sensitivity, opaqueness in interpretability, and high computational overhead for real-time deployment. In this paper, I propose a novel solution that surmounts these obstacles. Our approach harnesses the power of the Particle-Swarm Optimization algorithm to autonomously explore and optimize hyperparameters, a Multi-Head Attention mechanism to discern the salient features crucial for accurate forecasting, and a streamlined framework for computational efficiency. Our method undergoes rigorous evaluation using a genuine electricity demand dataset. The results underscore its superiority in terms of accuracy, robustness, and computational efficiency. Notably, our Mean Absolute Percentage Error of 1.9376 marks a significant advancement over existing state-of-the-art approaches, heralding a new era in short-term load forecasting.
    Polynomial Bounds for Learning Noisy Optical Physical Unclonable Functions and Connections to Learning With Errors. (arXiv:2308.09199v2 [cs.LG] UPDATED)
    It is shown that a class of optical physical unclonable functions (PUFs) can be learned to arbitrary precision with arbitrarily high probability, even in the presence of noise, given access to polynomially many challenge-response pairs and polynomially bounded computational power, under mild assumptions about the distributions of the noise and challenge vectors. This extends the results of Rh\"uramir et al. (2013), who showed a subset of this class of PUFs to be learnable in polynomial time in the absence of noise, under the assumption that the optics of the PUF were either linear or had negligible nonlinear effects. We derive polynomial bounds for the required number of samples and the computational complexity of a linear regression algorithm, based on size parameters of the PUF, the distributions of the challenge and noise vectors, and the probability and accuracy of the regression algorithm, with a similar analysis to one done by Bootle et al. (2018), who demonstrated a learning attack on a poorly implemented version of the Learning With Errors problem.
    Impression-Informed Multi-Behavior Recommender System: A Hierarchical Graph Attention Approach. (arXiv:2309.03169v2 [cs.IR] UPDATED)
    While recommender systems have significantly benefited from implicit feedback, they have often missed the nuances of multi-behavior interactions between users and items. Historically, these systems either amalgamated all behaviors, such as \textit{impression} (formerly \textit{view}), \textit{add-to-cart}, and \textit{buy}, under a singular 'interaction' label, or prioritized only the target behavior, often the \textit{buy} action, discarding valuable auxiliary signals. Although recent advancements tried addressing this simplification, they primarily gravitated towards optimizing the target behavior alone, battling with data scarcity. Additionally, they tended to bypass the nuanced hierarchy intrinsic to behaviors. To bridge these gaps, we introduce the \textbf{H}ierarchical \textbf{M}ulti-behavior \textbf{G}raph Attention \textbf{N}etwork (HMGN). This pioneering framework leverages attention mechanisms to discern information from both inter and intra-behaviors while employing a multi-task Hierarchical Bayesian Personalized Ranking (HBPR) for optimization. Recognizing the need for scalability, our approach integrates a specialized multi-behavior sub-graph sampling technique. Moreover, the adaptability of HMGN allows for the seamless inclusion of knowledge metadata and time-series data. Empirical results attest to our model's prowess, registering a notable performance boost of up to 64\% in NDCG@100 metrics over conventional graph neural network methods.
    Bridging the Gap Between Target Networks and Functional Regularization. (arXiv:2106.02613v4 [stat.ML] UPDATED)
    Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.
    Empirical Risk Minimization for Losses without Variance. (arXiv:2309.03818v1 [stat.ML])
    This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
    Comparing Sequential Forecasters. (arXiv:2110.00115v5 [stat.ME] UPDATED)
    Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
    Domain Adaptation for Efficiently Fine-tuning Vision Transformer with Encrypted Images. (arXiv:2309.02556v2 [cs.CV] UPDATED)
    In recent years, deep neural networks (DNNs) trained with transformed data have been applied to various applications such as privacy-preserving learning, access control, and adversarial defenses. However, the use of transformed data decreases the performance of models. Accordingly, in this paper, we propose a novel method for fine-tuning models with transformed images under the use of the vision transformer (ViT). The proposed domain adaptation method does not cause the accuracy degradation of models, and it is carried out on the basis of the embedding structure of ViT. In experiments, we confirmed that the proposed method prevents accuracy degradation even when using encrypted images with the CIFAR-10 and CIFAR-100 datasets.
    Efficient anti-symmetrization of a neural network layer by taming the sign problem. (arXiv:2205.12250v2 [cs.LG] UPDATED)
    Explicit antisymmetrization of a neural network is a potential candidate for a universal function approximator for generic antisymmetric functions, which are ubiquitous in quantum physics. However, this procedure is a priori factorially costly to implement, making it impractical for large numbers of particles. The strategy also suffers from a sign problem. Namely, due to near-exact cancellation of positive and negative contributions, the magnitude of the antisymmetrized function may be significantly smaller than before anti-symmetrization. We show that the anti-symmetric projection of a two-layer neural network can be evaluated efficiently, opening the door to using a generic antisymmetric layer as a building block in anti-symmetric neural network Ansatzes. This approximation is effective when the sign problem is controlled, and we show that this property depends crucially the choice of activation function under standard Xavier/He initialization methods. As a consequence, using a smooth activation function requires re-scaling of the neural network weights compared to standard initializations.
    Unlearnable Examples Give a False Sense of Security: Piercing through Unexploitable Data with Learnable Examples. (arXiv:2305.09241v4 [cs.LG] UPDATED)
    Safeguarding data from unauthorized exploitation is vital for privacy and security, especially in recent rampant research in security breach such as adversarial/membership attacks. To this end, \textit{unlearnable examples} (UEs) have been recently proposed as a compelling protection, by adding imperceptible perturbation to data so that models trained on them cannot classify them accurately on original clean distribution. Unfortunately, we find UEs provide a false sense of security, because they cannot stop unauthorized users from utilizing other unprotected data to remove the protection, by turning unlearnable data into learnable again. Motivated by this observation, we formally define a new threat by introducing \textit{learnable unauthorized examples} (LEs) which are UEs with their protection removed. The core of this approach is a novel purification process that projects UEs onto the manifold of LEs. This is realized by a new joint-conditional diffusion model which denoises UEs conditioned on the pixel and perceptual similarity between UEs and LEs. Extensive experiments demonstrate that LE delivers state-of-the-art countering performance against both supervised UEs and unsupervised UEs in various scenarios, which is the first generalizable countermeasure to UEs across supervised learning and unsupervised learning. Our code is available at \url{https://github.com/jiangw-0/LE_JCDP}.
    Enhancing Deep Learning Models through Tensorization: A Comprehensive Survey and Framework. (arXiv:2309.02428v2 [cs.LG] UPDATED)
    The burgeoning growth of public domain data and the increasing complexity of deep learning model architectures have underscored the need for more efficient data representation and analysis techniques. This paper is motivated by the work of Helal (2023) and aims to present a comprehensive overview of tensorization. This transformative approach bridges the gap between the inherently multidimensional nature of data and the simplified 2-dimensional matrices commonly used in linear algebra-based machine learning algorithms. This paper explores the steps involved in tensorization, multidimensional data sources, various multiway analysis methods employed, and the benefits of these approaches. A small example of Blind Source Separation (BSS) is presented comparing 2-dimensional algorithms and a multiway algorithm in Python. Results indicate that multiway analysis is more expressive. Contrary to the intuition of the dimensionality curse, utilising multidimensional datasets in their native form and applying multiway analysis methods grounded in multilinear algebra reveal a profound capacity to capture intricate interrelationships among various dimensions while, surprisingly, reducing the number of model parameters and accelerating processing. A survey of the multi-away analysis methods and integration with various Deep Neural Networks models is presented using case studies in different domains.
    ArtiGrasp: Physically Plausible Synthesis of Bi-Manual Dexterous Grasping and Articulation. (arXiv:2309.03891v1 [cs.RO])
    We present ArtiGrasp, a novel method to synthesize bi-manual hand-object interactions that include grasping and articulation. This task is challenging due to the diversity of the global wrist motions and the precise finger control that are necessary to articulate objects. ArtiGrasp leverages reinforcement learning and physics simulations to train a policy that controls the global and local hand pose. Our framework unifies grasping and articulation within a single policy guided by a single hand pose reference. Moreover, to facilitate the training of the precise finger control required for articulation, we present a learning curriculum with increasing difficulty. It starts with single-hand manipulation of stationary objects and continues with multi-agent training including both hands and non-stationary objects. To evaluate our method, we introduce Dynamic Object Grasping and Articulation, a task that involves bringing an object into a target articulated pose. This task requires grasping, relocation, and articulation. We show our method's efficacy towards this task. We further demonstrate that our method can generate motions with noisy hand-object pose estimates from an off-the-shelf image-based regressor.
    USE-Evaluator: Performance Metrics for Medical Image Segmentation Models with Uncertain, Small or Empty Reference Annotations. (arXiv:2209.13008v4 [eess.IV] UPDATED)
    Performance metrics for medical image segmentation models are used to measure the agreement between the reference annotation and the predicted segmentation. Usually, overlap metrics, such as the Dice, are used as a metric to evaluate the performance of these models in order for results to be comparable. However, there is a mismatch between the distributions of cases and difficulty level of segmentation tasks in public data sets compared to clinical practice. Common metrics fail to measure the impact of this mismatch, especially for clinical data sets that include low signal pathologies, a difficult segmentation task, and uncertain, small, or empty reference annotations. This limitation may result in ineffective research of machine learning practitioners in designing and optimizing models. Dimensions of evaluating clinical value include consideration of the uncertainty of reference annotations, independence from reference annotation volume size, and evaluation of classification of empty reference annotations. We study how uncertain, small, and empty reference annotations influence the value of metrics for medical image segmentation on an in-house data set regardless of the model. We examine metrics behavior on the predictions of a standard deep learning framework in order to identify metrics with clinical value. We compare to a public benchmark data set (BraTS 2019) with a high-signal pathology and certain, larger, and no empty reference annotations. We may show machine learning practitioners, how uncertain, small, or empty reference annotations require a rethinking of the evaluation and optimizing procedures. The evaluation code was released to encourage further analysis of this topic. https://github.com/SophieOstmeier/UncertainSmallEmpty.git
    Learning from Demonstration via Probabilistic Diagrammatic Teaching. (arXiv:2309.03835v1 [cs.RO])
    Learning for Demonstration (LfD) enables robots to acquire new skills by imitating expert demonstrations, allowing users to communicate their instructions in an intuitive manner. Recent progress in LfD often relies on kinesthetic teaching or teleoperation as the medium for users to specify the demonstrations. Kinesthetic teaching requires physical handling of the robot, while teleoperation demands proficiency with additional hardware. This paper introduces an alternative paradigm for LfD called Diagrammatic Teaching. Diagrammatic Teaching aims to teach robots novel skills by prompting the user to sketch out demonstration trajectories on 2D images of the scene, these are then synthesised as a generative model of motion trajectories in 3D task space. Additionally, we present the Ray-tracing Probabilistic Trajectory Learning (RPTL) framework for Diagrammatic Teaching. RPTL extracts time-varying probability densities from the 2D sketches, applies ray-tracing to find corresponding regions in 3D Cartesian space, and fits a probabilistic model of motion trajectories to these regions. New motion trajectories, which mimic those sketched by the user, can then be generated from the probabilistic model. We empirically validate our framework both in simulation and on real robots, which include a fixed-base manipulator and a quadruped-mounted manipulator.
    Diffusion-EDFs: Bi-equivariant Denoising Generative Modeling on SE(3) for Visual Robotic Manipulation. (arXiv:2309.02685v2 [cs.RO] UPDATED)
    Recent studies have verified that equivariant methods can significantly improve the data efficiency, generalizability, and robustness in robot learning. Meanwhile, denoising diffusion-based generative modeling has recently gained significant attention as a promising approach for robotic manipulation learning from demonstrations with stochastic behaviors. In this paper, we present Diffusion-EDFs, a novel approach that incorporates spatial roto-translation equivariance, i.e., SE(3)-equivariance to diffusion generative modeling. By integrating SE(3)-equivariance into our model architectures, we demonstrate that our proposed method exhibits remarkable data efficiency, requiring only 5 to 10 task demonstrations for effective end-to-end training. Furthermore, our approach showcases superior generalizability compared to previous diffusion-based manipulation methods.
    Sparse Federated Training of Object Detection in the Internet of Vehicles. (arXiv:2309.03569v1 [cs.LG])
    As an essential component part of the Intelligent Transportation System (ITS), the Internet of Vehicles (IoV) plays a vital role in alleviating traffic issues. Object detection is one of the key technologies in the IoV, which has been widely used to provide traffic management services by analyzing timely and sensitive vehicle-related information. However, the current object detection methods are mostly based on centralized deep training, that is, the sensitive data obtained by edge devices need to be uploaded to the server, which raises privacy concerns. To mitigate such privacy leakage, we first propose a federated learning-based framework, where well-trained local models are shared in the central server. However, since edge devices usually have limited computing power, plus a strict requirement of low latency in IoVs, we further propose a sparse training process on edge devices, which can effectively lighten the model, and ensure its training efficiency on edge devices, thereby reducing communication overheads. In addition, due to the diverse computing capabilities and dynamic environment, different sparsity rates are applied to edge devices. To further guarantee the performance, we propose, FedWeg, an improved aggregation scheme based on FedAvg, which is designed by the inverse ratio of sparsity rates. Experiments on the real-life dataset using YOLO show that the proposed scheme can achieve the required object detection rate while saving considerable communication costs.
    OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs. (arXiv:2309.03876v1 [cs.CL])
    Instruction-tuned Large Language Models (LLMs) have recently showcased remarkable ability to generate fitting responses to natural language instructions. However, an open research question concerns the inherent biases of trained models and their responses. For instance, if the data used to tune an LLM is dominantly written by persons with a specific political bias, we might expect generated answers to share this bias. Current research work seeks to de-bias such models, or suppress potentially biased answers. With this demonstration, we take a different view on biases in instruction-tuning: Rather than aiming to suppress them, we aim to make them explicit and transparent. To this end, we present OpinionGPT, a web demo in which users can ask questions and select all biases they wish to investigate. The demo will answer this question using a model fine-tuned on text representing each of the selected biases, allowing side-by-side comparison. To train the underlying model, we identified 11 different biases (political, geographic, gender, age) and derived an instruction-tuning corpus in which each answer was written by members of one of these demographics. This paper presents OpinionGPT, illustrates how we trained the bias-aware model and showcases the web application (available at https://opiniongpt.informatik.hu-berlin.de).
    ImageBind-LLM: Multi-modality Instruction Tuning. (arXiv:2309.03905v1 [cs.MM])
    We present ImageBind-LLM, a multi-modality instruction tuning method of large language models (LLMs) via ImageBind. Existing works mainly focus on language and image instruction tuning, different from which, our ImageBind-LLM can respond to multi-modality conditions, including audio, 3D point clouds, video, and their embedding-space arithmetic by only image-text alignment training. During training, we adopt a learnable bind network to align the embedding space between LLaMA and ImageBind's image encoder. Then, the image features transformed by the bind network are added to word tokens of all layers in LLaMA, which progressively injects visual instructions via an attention-free and zero-initialized gating mechanism. Aided by the joint embedding of ImageBind, the simple image-text training enables our model to exhibit superior multi-modality instruction-following capabilities. During inference, the multi-modality inputs are fed into the corresponding ImageBind encoders, and processed by a proposed visual cache model for further cross-modal embedding enhancement. The training-free cache model retrieves from three million image features extracted by ImageBind, which effectively mitigates the training-inference modality discrepancy. Notably, with our approach, ImageBind-LLM can respond to instructions of diverse modalities and demonstrate significant language generation quality. Code is released at https://github.com/OpenGVLab/LLaMA-Adapter.
    Primal-Dual Contextual Bayesian Optimization for Control System Online Optimization with Time-Average Constraints. (arXiv:2304.06104v2 [cs.LG] UPDATED)
    This paper studies the problem of online performance optimization of constrained closed-loop control systems, where both the objective and the constraints are unknown black-box functions affected by exogenous time-varying contextual disturbances. A primal-dual contextual Bayesian optimization algorithm is proposed that achieves sublinear cumulative regret with respect to the dynamic optimal solution under certain regularity conditions. Furthermore, the algorithm achieves zero time-average constraint violation, ensuring that the average value of the constraint function satisfies the desired constraint. The method is applied to both sampled instances from Gaussian processes and a continuous stirred tank reactor parameter tuning problem; simulation results show that the method simultaneously provides close-to-optimal performance and maintains constraint feasibility on average. This contrasts current state-of-the-art methods, which either suffer from large cumulative regret or severe constraint violations for the case studies presented.
    Q-Learning for MDPs with General Spaces: Convergence and Near Optimality via Quantization under Weak Continuity. (arXiv:2111.06781v3 [cs.LG] UPDATED)
    Reinforcement learning algorithms often require finiteness of state and action spaces in Markov decision processes (MDPs) (also called controlled Markov chains) and various efforts have been made in the literature towards the applicability of such algorithms for continuous state and action spaces. In this paper, we show that under very mild regularity conditions (in particular, involving only weak continuity of the transition kernel of an MDP), Q-learning for standard Borel MDPs via quantization of states and actions (called Quantized Q-Learning) converges to a limit, and furthermore this limit satisfies an optimality equation which leads to near optimality with either explicit performance bounds or which are guaranteed to be asymptotically optimal. Our approach builds on (i) viewing quantization as a measurement kernel and thus a quantized MDP as a partially observed Markov decision process (POMDP), (ii) utilizing near optimality and convergence results of Q-learning for POMDPs, and (iii) finally, near-optimality of finite state model approximations for MDPs with weakly continuous kernels which we show to correspond to the fixed point of the constructed POMDP. Thus, our paper presents a very general convergence and approximation result for the applicability of Q-learning for continuous MDPs.
    GPT Can Solve Mathematical Problems Without a Calculator. (arXiv:2309.03241v1 [cs.LG])
    Previous studies have typically assumed that large language models are unable to accurately perform arithmetic operations, particularly multiplication of >8 digits, and operations involving decimals and fractions, without the use of calculator tools. This paper aims to challenge this misconception. With sufficient training data, a 2 billion-parameter language model can accurately perform multi-digit arithmetic operations with almost 100% accuracy without data leakage, significantly surpassing GPT-4 (whose multi-digit multiplication accuracy is only 4.3%). We also demonstrate that our MathGLM, fine-tuned from GLM-10B on a dataset with additional multi-step arithmetic operations and math problems described in text, achieves similar performance to GPT-4 on a 5,000-samples Chinese math problem test set.
    Interactive Hyperparameter Optimization in Multi-Objective Problems via Preference Learning. (arXiv:2309.03581v1 [cs.LG])
    Hyperparameter optimization (HPO) is important to leverage the full potential of machine learning (ML). In practice, users are often interested in multi-objective (MO) problems, i.e., optimizing potentially conflicting objectives, like accuracy and energy consumption. To tackle this, the vast majority of MO-ML algorithms return a Pareto front of non-dominated machine learning models to the user. Optimizing the hyperparameters of such algorithms is non-trivial as evaluating a hyperparameter configuration entails evaluating the quality of the resulting Pareto front. In literature, there are known indicators that assess the quality of a Pareto front (e.g., hypervolume, R2) by quantifying different properties (e.g., volume, proximity to a reference point). However, choosing the indicator that leads to the desired Pareto front might be a hard task for a user. In this paper, we propose a human-centered interactive HPO approach tailored towards multi-objective ML leveraging preference learning to extract desiderata from users that guide the optimization. Instead of relying on the user guessing the most suitable indicator for their needs, our approach automatically learns an appropriate indicator. Concretely, we leverage pairwise comparisons of distinct Pareto fronts to learn such an appropriate quality indicator. Then, we optimize the hyperparameters of the underlying MO-ML algorithm towards this learned indicator using a state-of-the-art HPO approach. In an experimental study targeting the environmental impact of ML, we demonstrate that our approach leads to substantially better Pareto fronts compared to optimizing based on a wrong indicator pre-selected by the user, and performs comparable in the case of an advanced user knowing which indicator to pick.
    Characterizing Lipschitz Stability of GNN for Fairness. (arXiv:2309.03648v1 [cs.LG])
    The Lipschitz bound, a technique from robust statistics, can limit the maximum changes in the output concerning the input, taking into account associated irrelevant biased factors. It is an efficient and provable method for examining the output stability of machine learning models without incurring additional computation costs. Recently, Graph Neural Networks (GNNs), which operate on non-Euclidean data, have gained significant attention. However, no previous research has investigated the GNN Lipschitz bounds to shed light on stabilizing model outputs, especially when working on non-Euclidean data with inherent biases. Given the inherent biases in common graph data used for GNN training, it poses a serious challenge to constraining the GNN output perturbations induced by input biases, thereby safeguarding fairness during training. Recently, despite the Lipschitz constant's use in controlling the stability of Euclideanneural networks, the calculation of the precise Lipschitz constant remains elusive for non-Euclidean neural networks like GNNs, especially within fairness contexts. To narrow this gap, we begin with the general GNNs operating on an attributed graph, and formulate a Lipschitz bound to limit the changes in the output regarding biases associated with the input. Additionally, we theoretically analyze how the Lipschitz constant of a GNN model could constrain the output perturbations induced by biases learned from data for fairness training. We experimentally validate the Lipschitz bound's effectiveness in limiting biases of the model output. Finally, from a training dynamics perspective, we demonstrate why the theoretical Lipschitz bound can effectively guide the GNN training to better trade-off between accuracy and fairness.
    Community-Based Hierarchical Positive-Unlabeled (PU) Model Fusion for Chronic Disease Prediction. (arXiv:2309.03386v1 [cs.LG])
    Positive-Unlabeled (PU) Learning is a challenge presented by binary classification problems where there is an abundance of unlabeled data along with a small number of positive data instances, which can be used to address chronic disease screening problem. State-of-the-art PU learning methods have resulted in the development of various risk estimators, yet they neglect the differences among distinct populations. To address this issue, we present a novel Positive-Unlabeled Learning Tree (PUtree) algorithm. PUtree is designed to take into account communities such as different age or income brackets, in tasks of chronic disease prediction. We propose a novel approach for binary decision-making, which hierarchically builds community-based PU models and then aggregates their deliverables. Our method can explicate each PU model on the tree for the optimized non-leaf PU node splitting. Furthermore, a mask-recovery data augmentation strategy enables sufficient training of the model in individual communities. Additionally, the proposed approach includes an adversarial PU risk estimator to capture hierarchical PU-relationships, and a model fusion network that integrates data from each tree path, resulting in robust binary classification results. We demonstrate the superior performance of PUtree as well as its variants on two benchmarks and a new diabetes-prediction dataset.
    Understanding Self-Supervised Learning of Speech Representation via Invariance and Redundancy Reduction. (arXiv:2309.03619v1 [cs.SD])
    The choice of the objective function is crucial in emerging high-quality representations from self-supervised learning. This paper investigates how different formulations of the Barlow Twins (BT) objective impact downstream task performance for speech data. We propose Modified Barlow Twins (MBT) with normalized latents to enforce scale-invariance and evaluate on speaker identification, gender recognition and keyword spotting tasks. Our results show MBT improves representation generalization over original BT, especially when fine-tuning with limited target data. This highlights the importance of designing objectives that encourage invariant and transferable representations. Our analysis provides insights into how the BT learning objective can be tailored to produce speech representations that excel when adapted to new downstream tasks. This study is an important step towards developing reusable self-supervised speech representations.
    A Natural Gas Consumption Forecasting System for Continual Learning Scenarios based on Hoeffding Trees with Change Point Detection Mechanism. (arXiv:2309.03720v1 [cs.LG])
    Forecasting natural gas consumption, considering seasonality and trends, is crucial in planning its supply and consumption and optimizing the cost of obtaining it, mainly by industrial entities. However, in times of threats to its supply, it is also a critical element that guarantees the supply of this raw material to meet individual consumers' needs, ensuring society's energy security. This article introduces a novel multistep ahead forecasting of natural gas consumption with change point detection integration for model collection selection with continual learning capabilities using data stream processing. The performance of the forecasting models based on the proposed approach is evaluated in a complex real-world use case of natural gas consumption forecasting. We employed Hoeffding tree predictors as forecasting models and the Pruned Exact Linear Time (PELT) algorithm for the change point detection procedure. The change point detection integration enables selecting a different model collection for successive time frames. Thus, three model collection selection procedures (with and without an error feedback loop) are defined and evaluated for forecasting scenarios with various densities of detected change points. These models were compared with change point agnostic baseline approaches. Our experiments show that fewer change points result in a lower forecasting error regardless of the model collection selection procedure employed. Also, simpler model collection selection procedures omitting forecasting error feedback leads to more robust forecasting models suitable for continual learning tasks.
    Large Language Models as Optimizers. (arXiv:2309.03409v1 [cs.LG])
    Optimization is ubiquitous. While derivative-based algorithms have been powerful tools for various problems, the absence of gradient imposes challenges on many real-world applications. In this work, we propose Optimization by PROmpting (OPRO), a simple and effective approach to leverage large language models (LLMs) as optimizers, where the optimization task is described in natural language. In each optimization step, the LLM generates new solutions from the prompt that contains previously generated solutions with their values, then the new solutions are evaluated and added to the prompt for the next optimization step. We first showcase OPRO on linear regression and traveling salesman problems, then move on to prompt optimization where the goal is to find instructions that maximize the task accuracy. With a variety of LLMs, we demonstrate that the best prompts optimized by OPRO outperform human-designed prompts by up to 8% on GSM8K, and by up to 50% on Big-Bench Hard tasks.
    Non-inferiority of Deep Learning Acute Ischemic Stroke Segmentation on Non-Contrast CT Compared to Expert Neuroradiologists. (arXiv:2211.15341v3 [eess.IV] UPDATED)
    To determine if a convolutional neural network (CNN) deep learning model can accurately segment acute ischemic changes on non-contrast CT compared to neuroradiologists. Non-contrast CT (NCCT) examinations from 232 acute ischemic stroke patients who were enrolled in the DEFUSE 3 trial were included in this study. Three experienced neuroradiologists independently segmented hypodensity that reflected the ischemic core on each scan. The neuroradiologist with the most experience (expert A) served as the ground truth for deep learning model training. Two additional neuroradiologists (experts B and C) segmentations were used for data testing. The 232 studies were randomly split into training and test sets. The training set was further randomly divided into 5 folds with training and validation sets. A 3-dimensional CNN architecture was trained and optimized to predict the segmentations of expert A from NCCT. The performance of the model was assessed using a set of volume, overlap, and distance metrics using non-inferiority thresholds of 20%, 3ml, and 3mm. The optimized model trained on expert A was compared to test experts B and C. We used a one-sided Wilcoxon signed-rank test to test for the non-inferiority of the model-expert compared to the inter-expert agreement. The final model performance for the ischemic core segmentation task reached a performance of 0.46+-0.09 Surface Dice at Tolerance 5mm and 0.47+-0.13 Dice when trained on expert A. Compared to the two test neuroradiologists the model-expert agreement was non-inferior to the inter-expert agreement, p < 0.05. The CNN accurately delineates the hypodense ischemic core on NCCT in acute ischemic stroke patients with an accuracy comparable to neuroradiologists.
    Cross-domain Sound Recognition for Efficient Underwater Data Analysis. (arXiv:2309.03451v1 [cs.SD])
    This paper presents a novel deep learning approach for analyzing massive underwater acoustic data by leveraging a model trained on a broad spectrum of non-underwater (aerial) sounds. Recognizing the challenge in labeling vast amounts of underwater data, we propose a two-fold methodology to accelerate this labor-intensive procedure. The first part of our approach involves PCA and UMAP visualization of the underwater data using the feature vectors of an aerial sound recognition model. This enables us to cluster the data in a two dimensional space and listen to points within these clusters to understand their defining characteristics. This innovative method simplifies the process of selecting candidate labels for further training. In the second part, we train a neural network model using both the selected underwater data and the non-underwater dataset. We conducted a quantitative analysis to measure the precision, recall, and F1 score of our model for recognizing airgun sounds, a common type of underwater sound. The F1 score achieved by our model exceeded 84.3%, demonstrating the effectiveness of our approach in analyzing underwater acoustic data. The methodology presented in this paper holds significant potential to reduce the amount of labor required in underwater data analysis and opens up new possibilities for further research in the field of cross-domain data analysis.
    Chat Failures and Troubles: Reasons and Solutions. (arXiv:2309.03708v1 [cs.RO])
    This paper examines some common problems in Human-Robot Interaction (HRI) causing failures and troubles in Chat. A given use case's design decisions start with the suitable robot, the suitable chatting model, identifying common problems that cause failures, identifying potential solutions, and planning continuous improvement. In conclusion, it is recommended to use a closed-loop control algorithm that guides the use of trained Artificial Intelligence (AI) pre-trained models and provides vocabulary filtering, re-train batched models on new datasets, learn online from data streams, and/or use reinforcement learning models to self-update the trained models and reduce errors.
    Reduced Simulations for High-Energy Physics, a Middle Ground for Data-Driven Physics Research. (arXiv:2309.03780v1 [hep-ex])
    Subatomic particle track reconstruction (tracking) is a vital task in High-Energy Physics experiments. Tracking is exceptionally computationally challenging and fielded solutions, relying on traditional algorithms, do not scale linearly. Machine Learning (ML) assisted solutions are a promising answer. We argue that a complexity-reduced problem description and the data representing it, will facilitate the solution exploration workflow. We provide the REDuced VIrtual Detector (REDVID) as a complexity-reduced detector model and particle collision event simulator combo. REDVID is intended as a simulation-in-the-loop, to both generate synthetic data efficiently and to simplify the challenge of ML model design. The fully parametric nature of our tool, with regards to system-level configuration, while in contrast to physics-accurate simulations, allows for the generation of simplified data for research and education, at different levels. Resulting from the reduced complexity, we showcase the computational efficiency of REDVID by providing the computational cost figures for a multitude of simulation benchmarks. As a simulation and a generative tool for ML-assisted solution design, REDVID is highly flexible, reusable and open-source. Reference data sets generated with REDVID are publicly available.
    Improved theoretical guarantee for rank aggregation via spectral method. (arXiv:2309.03808v1 [stat.ML])
    Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
    Uncovering Drift in Textual Data: An Unsupervised Method for Detecting and Mitigating Drift in Machine Learning Models. (arXiv:2309.03831v1 [cs.CL])
    Drift in machine learning refers to the phenomenon where the statistical properties of data or context, in which the model operates, change over time leading to a decrease in its performance. Therefore, maintaining a constant monitoring process for machine learning model performance is crucial in order to proactively prevent any potential performance regression. However, supervised drift detection methods require human annotation and consequently lead to a longer time to detect and mitigate the drift. In our proposed unsupervised drift detection method, we follow a two step process. Our first step involves encoding a sample of production data as the target distribution, and the model training data as the reference distribution. In the second step, we employ a kernel-based statistical test that utilizes the maximum mean discrepancy (MMD) distance metric to compare the reference and target distributions and estimate any potential drift. Our method also identifies the subset of production data that is the root cause of the drift. The models retrained using these identified high drift samples show improved performance on online customer experience quality metrics.
    Cross-Image Context Matters for Bongard Problems. (arXiv:2309.03468v1 [cs.CV])
    Current machine learning methods struggle to solve Bongard problems, which are a type of IQ test that requires deriving an abstract "concept" from a set of positive and negative "support" images, and then classifying whether or not a new query image depicts the key concept. On Bongard-HOI, a benchmark for natural-image Bongard problems, existing methods have only reached 66% accuracy (where chance is 50%). Low accuracy is often attributed to neural nets' lack of ability to find human-like symbolic rules. In this work, we point out that many existing methods are forfeiting accuracy due to a much simpler problem: they do not incorporate information contained in the support set as a whole, and rely instead on information extracted from individual supports. This is a critical issue, because unlike in few-shot learning tasks concerning object classification, the "key concept" in a typical Bongard problem can only be distinguished using multiple positives and multiple negatives. We explore a variety of simple methods to take this cross-image context into account, and demonstrate substantial gains over prior methods, leading to new state-of-the-art performance on Bongard-LOGO (75.3%) and Bongard-HOI (72.45%) and strong performance on the original Bongard problem set (60.84%).
    A Causal Perspective on Loan Pricing: Investigating the Impacts of Selection Bias on Identifying Bid-Response Functions. (arXiv:2309.03730v1 [cs.LG])
    In lending, where prices are specific to both customers and products, having a well-functioning personalized pricing policy in place is essential to effective business making. Typically, such a policy must be derived from observational data, which introduces several challenges. While the problem of ``endogeneity'' is prominently studied in the established pricing literature, the problem of selection bias (or, more precisely, bid selection bias) is not. We take a step towards understanding the effects of selection bias by posing pricing as a problem of causal inference. Specifically, we consider the reaction of a customer to price a treatment effect. In our experiments, we simulate varying levels of selection bias on a semi-synthetic dataset on mortgage loan applications in Belgium. We investigate the potential of parametric and nonparametric methods for the identification of individual bid-response functions. Our results illustrate how conventional methods such as logistic regression and neural networks suffer adversely from selection bias. In contrast, we implement state-of-the-art methods from causal machine learning and show their capability to overcome selection bias in pricing data.
    Broadband Ground Motion Synthesis via Generative Adversarial Neural Operators: Development and Validation. (arXiv:2309.03447v1 [physics.geo-ph])
    We present a data-driven model for ground-motion synthesis using a Generative Adversarial Neural Operator (GANO) that combines recent advancements in machine learning and open access strong motion data sets to generate three-component acceleration time histories conditioned on moment magnitude ($M$), rupture distance ($R_{rup}$), time-average shear-wave velocity at the top $30m$ ($V_{S30}$), and tectonic environment or style of faulting. We use Neural Operators, a resolution invariant architecture that guarantees that the model training is independent of the data sampling frequency. We first present the conditional ground-motion synthesis algorithm (referred to heretofore as cGM-GANO) and discuss its advantages compared to previous work. Next, we verify the cGM-GANO framework using simulated ground motions generated with the Southern California Earthquake Center (SCEC) Broadband Platform (BBP). We lastly train cGM-GANO on a KiK-net dataset from Japan, showing that the framework can recover the magnitude, distance, and $V_{S30}$ scaling of Fourier amplitude and pseudo-spectral accelerations. We evaluate cGM-GANO through residual analysis with the empirical dataset as well as by comparison with conventional Ground Motion Models (GMMs) for selected ground motion scenarios. Results show that cGM-GANO produces consistent median scaling with the GMMs for the corresponding tectonic environments. The largest misfit is observed at short distances due to the scarcity of training data. With the exception of short distances, the aleatory variability of the response spectral ordinates is also well captured, especially for subduction events due to the adequacy of training data. Applications of the presented framework include generation of risk-targeted ground motions for site-specific engineering applications.
    Filtration Surfaces for Dynamic Graph Classification. (arXiv:2309.03616v1 [cs.LG])
    Existing approaches for classifying dynamic graphs either lift graph kernels to the temporal domain, or use graph neural networks (GNNs). However, current baselines have scalability issues, cannot handle a changing node set, or do not take edge weight information into account. We propose filtration surfaces, a novel method that is scalable and flexible, to alleviate said restrictions. We experimentally validate the efficacy of our model and show that filtration surfaces outperform previous state-of-the-art baselines on datasets that rely on edge weight information. Our method does so while being either completely parameter-free or having at most one parameter, and yielding the lowest overall standard deviation.
    Testing properties of distributions in the streaming model. (arXiv:2309.03245v1 [cs.DS])
    We study distribution testing in the standard access model and the conditional access model when the memory available to the testing algorithm is bounded. In both scenarios, the samples appear in an online fashion and the goal is to test the properties of distribution using an optimal number of samples subject to a memory constraint on how many samples can be stored at a given time. First, we provide a trade-off between the sample complexity and the space complexity for testing identity when the samples are drawn according to the conditional access oracle. We then show that we can learn a succinct representation of a monotone distribution efficiently with a memory constraint on the number of samples that are stored that is almost optimal. We also show that the algorithm for monotone distributions can be extended to a larger class of decomposable distributions.
    A Robust Negative Learning Approach to Partial Domain Adaptation Using Source Prototypes. (arXiv:2309.03531v1 [cs.CV])
    This work proposes a robust Partial Domain Adaptation (PDA) framework that mitigates the negative transfer problem by incorporating a robust target-supervision strategy. It leverages ensemble learning and includes diverse, complementary label feedback, alleviating the effect of incorrect feedback and promoting pseudo-label refinement. Rather than relying exclusively on first-order moments for distribution alignment, our approach offers explicit objectives to optimize intra-class compactness and inter-class separation with the inferred source prototypes and highly-confident target samples in a domain-invariant fashion. Notably, we ensure source data privacy by eliminating the need to access the source data during the adaptation phase through a priori inference of source prototypes. We conducted a series of comprehensive experiments, including an ablation analysis, covering a range of partial domain adaptation tasks. Comprehensive evaluations on benchmark datasets corroborate our framework's enhanced robustness and generalization, demonstrating its superiority over existing state-of-the-art PDA approaches.
    Knowledge Graphs in Practice: Characterizing their Users, Challenges, and Visualization Opportunities. (arXiv:2304.01311v3 [cs.HC] UPDATED)
    This study presents insights from interviews with nineteen Knowledge Graph (KG) practitioners who work in both enterprise and academic settings on a wide variety of use cases. Through this study, we identify critical challenges experienced by KG practitioners when creating, exploring, and analyzing KGs that could be alleviated through visualization design. Our findings reveal three major personas among KG practitioners - KG Builders, Analysts, and Consumers - each of whom have their own distinct expertise and needs. We discover that KG Builders would benefit from schema enforcers, while KG Analysts need customizable query builders that provide interim query results. For KG Consumers, we identify a lack of efficacy for node-link diagrams, and the need for tailored domain-specific visualizations to promote KG adoption and comprehension. Lastly, we find that implementing KGs effectively in practice requires both technical and social solutions that are not addressed with current tools, technologies, and collaborative workflows. From the analysis of our interviews, we distill several visualization research directions to improve KG usability, including knowledge cards that balance digestibility and discoverability, timeline views to track temporal changes, interfaces that support organic discovery, and semantic explanations for AI and machine learning predictions.
    LDMRes-Net: Enabling Efficient Medical Image Segmentation on IoT and Edge Platforms. (arXiv:2306.06145v2 [eess.IV] UPDATED)
    In this study, we propose LDMRes-Net, a lightweight dual-multiscale residual block-based computational neural network tailored for medical image segmentation on IoT and edge platforms. Conventional U-Net-based models face challenges in meeting the speed and efficiency demands of real-time clinical applications, such as disease monitoring, radiation therapy, and image-guided surgery. LDMRes-Net overcomes these limitations with its remarkably low number of learnable parameters (0.072M), making it highly suitable for resource-constrained devices. The model's key innovation lies in its dual multi-residual block architecture, which enables the extraction of refined features on multiple scales, enhancing overall segmentation performance. To further optimize efficiency, the number of filters is carefully selected to prevent overlap, reduce training time, and improve computational efficiency. The study includes comprehensive evaluations, focusing on segmentation of the retinal image of vessels and hard exudates crucial for the diagnosis and treatment of ophthalmology. The results demonstrate the robustness, generalizability, and high segmentation accuracy of LDMRes-Net, positioning it as an efficient tool for accurate and rapid medical image segmentation in diverse clinical applications, particularly on IoT and edge platforms. Such advances hold significant promise for improving healthcare outcomes and enabling real-time medical image analysis in resource-limited settings.
    Efficient Single Object Detection on Image Patches with Early Exit Enhanced High-Precision CNNs. (arXiv:2309.03530v1 [cs.CV])
    This paper proposes a novel approach for detecting objects using mobile robots in the context of the RoboCup Standard Platform League, with a primary focus on detecting the ball. The challenge lies in detecting a dynamic object in varying lighting conditions and blurred images caused by fast movements. To address this challenge, the paper presents a convolutional neural network architecture designed specifically for computationally constrained robotic platforms. The proposed CNN is trained to achieve high precision classification of single objects in image patches and to determine their precise spatial positions. The paper further integrates Early Exits into the existing high-precision CNN architecture to reduce the computational cost of easily rejectable cases in the background class. The training process involves a composite loss function based on confidence and positional losses with dynamic weighting and data augmentation. The proposed approach achieves a precision of 100% on the validation dataset and a recall of almost 87%, while maintaining an execution time of around 170 $\mu$s per hypotheses. By combining the proposed approach with an Early Exit, a runtime optimization of more than 28%, on average, can be achieved compared to the original CNN. Overall, this paper provides an efficient solution for an enhanced detection of objects, especially the ball, in computationally constrained robotic platforms.
    Source Camera Identification and Detection in Digital Videos through Blind Forensics. (arXiv:2309.03353v1 [cs.CV])
    Source camera identification in digital videos is the problem of associating an unknown digital video with its source device, within a closed set of possible devices. The existing techniques in source detection of digital videos try to find a fingerprint of the actual source in the video in form of PRNU (Photo Response Non--Uniformity), and match it against the SPN (Sensor Pattern Noise) of each possible device. The highest correlation indicates the correct source. We investigate the problem of identifying a video source through a feature based approach using machine learning. In this paper, we present a blind forensic technique of video source authentication and identification, based on feature extraction, feature selection and subsequent source classification. The main aim is to determine whether a claimed source for a video is actually its original source. If not, we identify its original source. Our experimental results prove the efficiency of the proposed method compared to traditional fingerprint based technique.
    Byzantine-Robust Federated Learning with Variance Reduction and Differential Privacy. (arXiv:2309.03437v1 [cs.LG])
    Federated learning (FL) is designed to preserve data privacy during model training, where the data remains on the client side (i.e., IoT devices), and only model updates of clients are shared iteratively for collaborative learning. However, this process is vulnerable to privacy attacks and Byzantine attacks: the local model updates shared throughout the FL network will leak private information about the local training data, and they can also be maliciously crafted by Byzantine attackers to disturb the learning. In this paper, we propose a new FL scheme that guarantees rigorous privacy and simultaneously enhances system robustness against Byzantine attacks. Our approach introduces sparsification- and momentum-driven variance reduction into the client-level differential privacy (DP) mechanism, to defend against Byzantine attackers. The security design does not violate the privacy guarantee of the client-level DP mechanism; hence, our approach achieves the same client-level DP guarantee as the state-of-the-art. We conduct extensive experiments on both IID and non-IID datasets and different tasks and evaluate the performance of our approach against different Byzantine attacks by comparing it with state-of-the-art defense methods. The results of our experiments show the efficacy of our framework and demonstrate its ability to improve system robustness against Byzantine attacks while achieving a strong privacy guarantee.
    Automated Bioinformatics Analysis via AutoBA. (arXiv:2309.03242v1 [q-bio.GN])
    With the fast-growing and evolving omics data, the demand for streamlined and adaptable tools to handle the analysis continues to grow. In response to this need, we introduce Auto Bioinformatics Analysis (AutoBA), an autonomous AI agent based on a large language model designed explicitly for conventional omics data analysis. AutoBA simplifies the analytical process by requiring minimal user input while delivering detailed step-by-step plans for various bioinformatics tasks. Through rigorous validation by expert bioinformaticians, AutoBA's robustness and adaptability are affirmed across a diverse range of omics analysis cases, including whole genome sequencing (WGS), RNA sequencing (RNA-seq), single-cell RNA-seq, ChIP-seq, and spatial transcriptomics. AutoBA's unique capacity to self-design analysis processes based on input data variations further underscores its versatility. Compared with online bioinformatic services, AutoBA deploys the analysis locally, preserving data privacy. Moreover, different from the predefined pipeline, AutoBA has adaptability in sync with emerging bioinformatics tools. Overall, AutoBA represents a convenient tool, offering robustness and adaptability for complex omics data analysis.
    Which algorithm to select in sports timetabling?. (arXiv:2309.03229v1 [cs.AI])
    Any sports competition needs a timetable, specifying when and where teams meet each other. The recent International Timetabling Competition (ITC2021) on sports timetabling showed that, although it is possible to develop general algorithms, the performance of each algorithm varies considerably over the problem instances. This paper provides an instance space analysis for sports timetabling, resulting in powerful insights into the strengths and weaknesses of eight state-of-the-art algorithms. Based on machine learning techniques, we propose an algorithm selection system that predicts which algorithm is likely to perform best when given the characteristics of a sports timetabling problem instance. Furthermore, we identify which characteristics are important in making that prediction, providing insights in the performance of the algorithms, and suggestions to further improve them. Finally, we assess the empirical hardness of the instances. Our results are based on large computational experiments involving about 50 years of CPU time on more than 500 newly generated problem instances.
    Quantum-AI empowered Intelligent Surveillance: Advancing Public Safety Through Innovative Contraband Detection. (arXiv:2309.03231v1 [quant-ph])
    Surveillance systems have emerged as crucial elements in upholding peace and security in the modern world. Their ubiquity aids in monitoring suspicious activities effectively. However, in densely populated environments, continuous active monitoring becomes impractical, necessitating the development of intelligent surveillance systems. AI integration in the surveillance domain was a big revolution, however, speed issues have prevented its widespread implementation in the field. It has been observed that quantum artificial intelligence has led to a great breakthrough. Quantum artificial intelligence-based surveillance systems have shown to be more accurate as well as capable of performing well in real-time scenarios, which had never been seen before. In this research, a RentinaNet model is integrated with Quantum CNN and termed as Quantum-RetinaNet. By harnessing the Quantum capabilities of QCNN, Quantum-RetinaNet strikes a balance between accuracy and speed. This innovative integration positions it as a game-changer, addressing the challenges of active monitoring in densely populated scenarios. As demand for efficient surveillance solutions continues to grow, Quantum-RetinaNet offers a compelling alternative to existing CNN models, upholding accuracy standards without sacrificing real-time performance. The unique attributes of Quantum-RetinaNet have far-reaching implications for the future of intelligent surveillance. With its enhanced processing speed, it is poised to revolutionize the field, catering to the pressing need for rapid yet precise monitoring. As Quantum-RetinaNet becomes the new standard, it ensures public safety and security while pushing the boundaries of AI in surveillance.
    Relay Diffusion: Unifying diffusion process across resolutions for image synthesis. (arXiv:2309.03350v1 [cs.CV])
    Diffusion models achieved great success in image synthesis, but still face challenges in high-resolution generation. Through the lens of discrete cosine transformation, we find the main reason is that \emph{the same noise level on a higher resolution results in a higher Signal-to-Noise Ratio in the frequency domain}. In this work, we present Relay Diffusion Model (RDM), which transfers a low-resolution image or noise into an equivalent high-resolution one for diffusion model via blurring diffusion and block noise. Therefore, the diffusion process can continue seamlessly in any new resolution or model without restarting from pure noise or low-resolution conditioning. RDM achieves state-of-the-art FID on CelebA-HQ and sFID on ImageNet 256$\times$256, surpassing previous works such as ADM, LDM and DiT by a large margin. All the codes and checkpoints are open-sourced at \url{https://github.com/THUDM/RelayDiffusion}.
    Personalized Tucker Decomposition: Modeling Commonality and Peculiarity on Tensor Data. (arXiv:2309.03439v1 [cs.LG])
    We propose personalized Tucker decomposition (perTucker) to address the limitations of traditional tensor decomposition methods in capturing heterogeneity across different datasets. perTucker decomposes tensor data into shared global components and personalized local components. We introduce a mode orthogonality assumption and develop a proximal gradient regularized block coordinate descent algorithm that is guaranteed to converge to a stationary point. By learning unique and common representations across datasets, we demonstrate perTucker's effectiveness in anomaly detection, client classification, and clustering through a simulation study and two case studies on solar flare detection and tonnage signal classification.
    ViewMix: Augmentation for Robust Representation in Self-Supervised Learning. (arXiv:2309.03360v1 [cs.CV])
    Joint Embedding Architecture-based self-supervised learning methods have attributed the composition of data augmentations as a crucial factor for their strong representation learning capabilities. While regional dropout strategies have proven to guide models to focus on lesser indicative parts of the objects in supervised methods, it hasn't been adopted by self-supervised methods for generating positive pairs. This is because the regional dropout methods are not suitable for the input sampling process of the self-supervised methodology. Whereas dropping informative pixels from the positive pairs can result in inefficient training, replacing patches of a specific object with a different one can steer the model from maximizing the agreement between different positive pairs. Moreover, joint embedding representation learning methods have not made robustness their primary training outcome. To this end, we propose the ViewMix augmentation policy, specially designed for self-supervised learning, upon generating different views of the same image, patches are cut and pasted from one view to another. By leveraging the different views created by this augmentation strategy, multiple joint embedding-based self-supervised methodologies obtained better localization capability and consistently outperformed their corresponding baseline methods. It is also demonstrated that incorporating ViewMix augmentation policy promotes robustness of the representations in the state-of-the-art methods. Furthermore, our experimentation and analysis of compute times suggest that ViewMix augmentation doesn't introduce any additional overhead compared to other counterparts.
    Towards Comparable Knowledge Distillation in Semantic Image Segmentation. (arXiv:2309.03659v1 [cs.CV])
    Knowledge Distillation (KD) is one proposed solution to large model sizes and slow inference speed in semantic segmentation. In our research we identify 25 proposed distillation loss terms from 14 publications in the last 4 years. Unfortunately, a comparison of terms based on published results is often impossible, because of differences in training configurations. A good illustration of this problem is the comparison of two publications from 2022. Using the same models and dataset, Structural and Statistical Texture Distillation (SSTKD) reports an increase of student mIoU of 4.54 and a final performance of 29.19, while Adaptive Perspective Distillation (APD) only improves student performance by 2.06 percentage points, but achieves a final performance of 39.25. The reason for such extreme differences is often a suboptimal choice of hyperparameters and a resulting underperformance of the student model used as reference point. In our work, we reveal problems of insufficient hyperparameter tuning by showing that distillation improvements of two widely accepted frameworks, SKD and IFVD, vanish when hyperparameters are optimized sufficiently. To improve comparability of future research in the field, we establish a solid baseline for three datasets and two student models and provide extensive information on hyperparameter tuning. We find that only two out of eight techniques can compete with our simple baseline on the ADE20K dataset.
    Multi-Modality Guidance Network For Missing Modality Inference. (arXiv:2309.03452v1 [cs.CV])
    Multimodal models have gained significant success in recent years. Standard multimodal approaches often assume unchanged modalities from training stage to inference stage. In practice, however, many scenarios fail to satisfy such assumptions with missing modalities during inference, leading to limitations on where multimodal models can be applied. While existing methods mitigate the problem through reconstructing the missing modalities, it increases unnecessary computational cost, which could be just as critical, especially for large, deployed systems. To solve the problem from both sides, we propose a novel guidance network that promotes knowledge sharing during training, taking advantage of the multimodal representations to train better single-modality models for inference. Real-life experiment in violence detection shows that our proposed framework trains single-modality models that significantly outperform its traditionally trained counterparts while maintaining the same inference cost.
    Robotic Table Tennis: A Case Study into a High Speed Learning System. (arXiv:2309.03315v1 [cs.RO])
    We present a deep-dive into a real-world robotic learning system that, in previous work, was shown to be capable of hundreds of table tennis rallies with a human and has the ability to precisely return the ball to desired targets. This system puts together a highly optimized perception subsystem, a high-speed low-latency robot controller, a simulation paradigm that can prevent damage in the real world and also train policies for zero-shot transfer, and automated real world environment resets that enable autonomous training and evaluation on physical robots. We complement a complete system description, including numerous design decisions that are typically not widely disseminated, with a collection of studies that clarify the importance of mitigating various sources of latency, accounting for training and deployment distribution shifts, robustness of the perception system, sensitivity to policy hyper-parameters, and choice of action space. A video demonstrating the components of the system and details of experimental results can be found at https://youtu.be/uFcnWjB42I0.
    No Train Still Gain. Unleash Mathematical Reasoning of Large Language Models with Monte Carlo Tree Search Guided by Energy Function. (arXiv:2309.03224v1 [cs.AI])
    Large language models (LLMs) exhibit impressive language understanding and in-context learning abilities including natural language processing (NLP) tasks and challenging mathematical reasoning. However, due to the lack of process-supervision, applying PLMs to mathematical reasoning tasks often fail to generate correct reasoning steps and final answer even though solutions have high probabilities. To unleash the mathematical reasoning of finetuned-LLMs without any further fineutuning steps, we propose a method to endow LLMs with immediate reaction and delicate reasoning system via Monte Carlo Tree Search(MCTS) and a light energy function to rank the decision steps. In particular, We first re-formalize the finetuned-LLMs to a Residual-based Energy Model~(Residual-EBM) and apply noise contrastive estimation to estimate the parameters of energy function . Then we use MCTS with energy function as path verifier to search the output space and evaluating the reasoning path. Through extensive experiments on two mathematical reasoning benchmarks, namely GSM8k and MATH, we reveal the extraordinary capabilities of our method that improve the pass@1 of the finetuned-model without further finetuning or RLHF alignment by a substantial margin.
    Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat. (arXiv:2309.03237v1 [cs.LG])
    We carefully evaluate a number of algorithms for learning in a federated environment, and test their utility for a variety of image classification tasks. We consider many issues that have not been adequately considered before: whether learning over data sets that do not have diverse sets of images affects the results; whether to use a pre-trained feature extraction "backbone"; how to evaluate learner performance (we argue that classification accuracy is not enough), among others. Overall, across a wide variety of settings, we find that vertically decomposing a neural network seems to give the best results, and outperforms more standard reconciliation-used methods.
    Using Neural Networks for Fast SAR Roughness Estimation of High Resolution Images. (arXiv:2309.03351v1 [cs.CV])
    The analysis of Synthetic Aperture Radar (SAR) imagery is an important step in remote sensing applications, and it is a challenging problem due to its inherent speckle noise. One typical solution is to model the data using the $G_I^0$ distribution and extract its roughness information, which in turn can be used in posterior imaging tasks, such as segmentation, classification and interpretation. This leads to the need of quick and reliable estimation of the roughness parameter from SAR data, especially with high resolution images. Unfortunately, traditional parameter estimation procedures are slow and prone to estimation failures. In this work, we proposed a neural network-based estimation framework that first learns how to predict underlying parameters of $G_I^0$ samples and then can be used to estimate the roughness of unseen data. We show that this approach leads to an estimator that is quicker, yields less estimation error and is less prone to failures than the traditional estimation procedures for this problem, even when we use a simple network. More importantly, we show that this same methodology can be generalized to handle image inputs and, even if trained on purely synthetic data for a few seconds, is able to perform real time pixel-wise roughness estimation for high resolution real SAR imagery.
    A Probabilistic Semi-Supervised Approach with Triplet Markov Chains. (arXiv:2309.03707v1 [stat.ML])
    Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
    TSGBench: Time Series Generation Benchmark. (arXiv:2309.03755v1 [cs.LG])
    Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural TSG Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted extensive experiments across ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures, all gauged through \textsf{TSGBench}. The results highlight its remarkable efficacy and consistency. More importantly, \textsf{TSGBench} delivers a statistical breakdown of method rankings, illuminating performance variations across different datasets and measures, and offering nuanced insights into the effectiveness of each method.
    EGIC: Enhanced Low-Bit-Rate Generative Image Compression Guided by Semantic Segmentation. (arXiv:2309.03244v1 [eess.IV])
    We introduce EGIC, a novel generative image compression method that allows traversing the distortion-perception curve efficiently from a single model. Specifically, we propose an implicitly encoded variant of image interpolation that predicts the residual between a MSE-optimized and GAN-optimized decoder output. On the receiver side, the user can then control the impact of the residual on the GAN-based reconstruction. Together with improved GAN-based building blocks, EGIC outperforms a wide-variety of perception-oriented and distortion-oriented baselines, including HiFiC, MRIC and DIRAC, while performing almost on par with VTM-20.0 on the distortion end. EGIC is simple to implement, very lightweight (e.g. 0.18x model parameters compared to HiFiC) and provides excellent interpolation characteristics, which makes it a promising candidate for practical applications targeting the low bit range.
    BoXHED2.0: Scalable boosting of dynamic survival analysis. (arXiv:2103.12591v5 [cs.LG] UPDATED)
    Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
    Domain Generalization for Mammographic Image Analysis with Contrastive Learning. (arXiv:2304.10226v5 [cs.CV] UPDATED)
    The deep learning technique has been shown to be effectively addressed several image analysis tasks in the computer-aided diagnosis scheme for mammography. The training of an efficacious deep learning model requires large data with diverse styles and qualities. The diversity of data often comes from the use of various scanners of vendors. But, in practice, it is impractical to collect a sufficient amount of diverse data for training. To this end, a novel contrastive learning is developed to equip the deep learning models with better style generalization capability. Specifically, the multi-style and multi-view unsupervised self-learning scheme is carried out to seek robust feature embedding against style diversity as a pretrained model. Afterward, the pretrained network is further fine-tuned to the downstream tasks, e.g., mass detection, matching, BI-RADS rating, and breast density classification. The proposed method has been evaluated extensively and rigorously with mammograms from various vendor style domains and several public datasets. The experimental results suggest that the proposed domain generalization method can effectively improve performance of four mammographic image tasks on the data from both seen and unseen domains, and outperform many state-of-the-art (SOTA) generalization methods.
    Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences. (arXiv:2309.03791v1 [cs.LG])
    We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
    Towards Personalized Federated Learning via Heterogeneous Model Reassembly. (arXiv:2308.08643v2 [cs.LG] UPDATED)
    This paper focuses on addressing the practical yet challenging problem of model heterogeneity in federated learning, where clients possess models with different network structures. To track this problem, we propose a novel framework called pFedHR, which leverages heterogeneous model reassembly to achieve personalized federated learning. In particular, we approach the problem of heterogeneous model personalization as a model-matching optimization task on the server side. Moreover, pFedHR automatically and dynamically generates informative and diverse personalized candidates with minimal human intervention. Furthermore, our proposed heterogeneous model reassembly technique mitigates the adverse impact introduced by using public data with different distributions from the client data to a certain extent. Experimental results demonstrate that pFedHR outperforms baselines on three datasets under both IID and Non-IID settings. Additionally, pFedHR effectively reduces the adverse impact of using different public data and dynamically generates diverse personalized models in an automated manner.
    Learning continuous-valued treatment effects through representation balancing. (arXiv:2309.03731v1 [cs.LG])
    Estimating the effects of treatments with an associated dose on an instance's outcome, the "dose response", is relevant in a variety of domains, from healthcare to business, economics, and beyond. Such effects, also known as continuous-valued treatment effects, are typically estimated from observational data, which may be subject to dose selection bias. This means that the allocation of doses depends on pre-treatment covariates. Previous studies have shown that conventional machine learning approaches fail to learn accurate individual estimates of dose responses under the presence of dose selection bias. In this work, we propose CBRNet, a causal machine learning approach to estimate an individual dose response from observational data. CBRNet adopts the Neyman-Rubin potential outcome framework and extends the concept of balanced representation learning for overcoming selection bias to continuous-valued treatments. Our work is the first to apply representation balancing in a continuous-valued treatment setting. We evaluate our method on a newly proposed benchmark. Our experiments demonstrate CBRNet's ability to accurately learn treatment effects under selection bias and competitive performance with respect to other state-of-the-art methods.
    Graph Fairing Convolutional Networks for Anomaly Detection. (arXiv:2010.10274v2 [cs.LG] UPDATED)
    Graph convolution is a fundamental building block for many deep neural networks on graph-structured data. In this paper, we introduce a simple, yet very effective graph convolutional network with skip connections for semi-supervised anomaly detection. The proposed layerwise propagation rule of our model is theoretically motivated by the concept of implicit fairing in geometry processing, and comprises a graph convolution module for aggregating information from immediate node neighbors and a skip connection module for combining layer-wise neighborhood representations. This propagation rule is derived from the iterative solution of the implicit fairing equation via the Jacobi method. In addition to capturing information from distant graph nodes through skip connections between the network's layers, our approach exploits both the graph structure and node features for learning discriminative node representations. These skip connections are integrated by design in our proposed network architecture. The effectiveness of our model is demonstrated through extensive experiments on five benchmark datasets, achieving better or comparable anomaly detection results against strong baseline methods. We also demonstrate through an ablation study that skip connection helps improve the model performance.
    Continual Pre-Training of Large Language Models: How to (re)warm your model?. (arXiv:2308.04014v2 [cs.CL] UPDATED)
    Large language models (LLMs) are routinely pre-trained on billions of tokens, only to restart the process over again once new data becomes available. A much cheaper and more efficient solution would be to enable the continual pre-training of these models, i.e. updating pre-trained models with new data instead of re-training them from scratch. However, the distribution shift induced by novel data typically results in degraded performance on past data. Taking a step towards efficient continual pre-training, in this work, we examine the effect of different warm-up strategies. Our hypothesis is that the learning rate must be re-increased to improve compute efficiency when training on a new dataset. We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule. We conduct all experiments on the Pythia 410M language model architecture and evaluate performance through validation perplexity. We experiment with different pre-training checkpoints, various maximum learning rates, and various warmup lengths. Our results show that while rewarming models first increases the loss on upstream and downstream data, in the longer run it improves the downstream performance, outperforming models trained from scratch$\unicode{x2013}$even for a large downstream dataset.
    Trinary Decision Trees for missing value handling. (arXiv:2309.03561v1 [stat.ML])
    This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
    Generating quantum feature maps using multi-objective genetic algorithm. (arXiv:2309.03307v1 [quant-ph])
    We present a novel approach for efficiently generating quantum feature maps for quantum-enhanced support vector machines, a kernel-based classifier, enabling access to high-dimensional Hilbert space. Our method employs a multi-objective genetic algorithm that simultaneously maximizes classification accuracy while minimizing both the local and non-local gate costs of the quantum feature map's circuit. To achieve this, we define distinct fitness functions for local gates and entanglement gates. Comparisons with classical classifiers are given in order to understand the advantages of using quantum machine learning. Surprisingly, our experiments reveal that the optimal configuration of quantum circuits for the quantum kernel method incorporates a proportional number of non-local gates for entanglement, contrary to previous literature where non-local gates were largely suppressed. Furthermore, we demonstrate that the separability indexes of data can be effectively leveraged to determine the number of non-local gates required for the quantum support vector machine's feature maps. This insight can significantly aid in selecting appropriate parameters, such as the entanglement parameter, in various quantum programming packages like quiskit.org based on data analysis. Our findings offer valuable guidance for enhancing the efficiency and accuracy of quantum machine learning algorithms.
    RAHNet: Retrieval Augmented Hybrid Network for Long-tailed Graph Classification. (arXiv:2308.02335v2 [cs.LG] UPDATED)
    Graph classification is a crucial task in many real-world multimedia applications, where graphs can represent various multimedia data types such as images, videos, and social networks. Previous efforts have applied graph neural networks (GNNs) in balanced situations where the class distribution is balanced. However, real-world data typically exhibit long-tailed class distributions, resulting in a bias towards the head classes when using GNNs and limited generalization ability over the tail classes. Recent approaches mainly focus on re-balancing different classes during model training, which fails to explicitly introduce new knowledge and sacrifices the performance of the head classes. To address these drawbacks, we propose a novel framework called Retrieval Augmented Hybrid Network (RAHNet) to jointly learn a robust feature extractor and an unbiased classifier in a decoupled manner. In the feature extractor training stage, we develop a graph retrieval module to search for relevant graphs that directly enrich the intra-class diversity for the tail classes. Moreover, we innovatively optimize a category-centered supervised contrastive loss to obtain discriminative representations, which is more suitable for long-tailed scenarios. In the classifier fine-tuning stage, we balance the classifier weights with two weight regularization techniques, i.e., Max-norm and weight decay. Experiments on various popular benchmarks verify the superiority of the proposed method against state-of-the-art approaches.
    Natural and Robust Walking using Reinforcement Learning without Demonstrations in High-Dimensional Musculoskeletal Models. (arXiv:2309.02976v2 [cs.RO] UPDATED)
    Humans excel at robust bipedal walking in complex natural environments. In each step, they adequately tune the interaction of biomechanical muscle dynamics and neuronal signals to be robust against uncertainties in ground conditions. However, it is still not fully understood how the nervous system resolves the musculoskeletal redundancy to solve the multi-objective control problem considering stability, robustness, and energy efficiency. In computer simulations, energy minimization has been shown to be a successful optimization target, reproducing natural walking with trajectory optimization or reflex-based control methods. However, these methods focus on particular motions at a time and the resulting controllers are limited when compensating for perturbations. In robotics, reinforcement learning~(RL) methods recently achieved highly stable (and efficient) locomotion on quadruped systems, but the generation of human-like walking with bipedal biomechanical models has required extensive use of expert data sets. This strong reliance on demonstrations often results in brittle policies and limits the application to new behaviors, especially considering the potential variety of movements for high-dimensional musculoskeletal models in 3D. Achieving natural locomotion with RL without sacrificing its incredible robustness might pave the way for a novel approach to studying human walking in complex natural environments. Videos: https://sites.google.com/view/naturalwalkingrl
    ClimSim: An open large-scale dataset for training high-resolution physics emulators in hybrid multi-scale climate simulators. (arXiv:2306.08754v3 [cs.LG] UPDATED)
    Modern climate projections lack adequate spatial and temporal resolution due to computational constraints. A consequence is inaccurate and imprecise predictions of critical processes such as storms. Hybrid methods that combine physics with machine learning (ML) have introduced a new generation of higher fidelity climate simulators that can sidestep Moore's Law by outsourcing compute-hungry, short, high-resolution simulations to ML emulators. However, this hybrid ML-physics simulation approach requires domain-specific treatment and has been inaccessible to ML experts because of lack of training data and relevant, easy-to-use workflows. We present ClimSim, the largest-ever dataset designed for hybrid ML-physics research. It comprises multi-scale climate simulations, developed by a consortium of climate scientists and ML researchers. It consists of 5.7 billion pairs of multivariate input and output vectors that isolate the influence of locally-nested, high-resolution, high-fidelity physics on a host climate simulator's macro-scale physical state. The dataset is global in coverage, spans multiple years at high sampling frequency, and is designed such that resulting emulators are compatible with downstream coupling into operational climate simulators. We implement a range of deterministic and stochastic regression baselines to highlight the ML challenges and their scoring. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res, https://huggingface.co/datasets/LEAP/ClimSim_low-res, and https://huggingface.co/datasets/LEAP/ClimSim_low-res_aqua-planet) and code (https://leap-stc.github.io/ClimSim) are released openly to support the development of hybrid ML-physics and high-fidelity climate simulations for the benefit of science and society.
    Max-Margin Token Selection in Attention Mechanism. (arXiv:2306.13596v3 [cs.LG] UPDATED)
    Attention mechanism is a central component of the transformer architecture which led to the phenomenal success of large language models. However, the theoretical principles underlying the attention mechanism are poorly understood, especially its nonconvex optimization dynamics. In this work, we explore the seminal softmax-attention model $f(\boldsymbol{X})=\langle \boldsymbol{Xv}, \texttt{softmax}(\boldsymbol{XWp})\rangle$, where $\boldsymbol{X}$ is the token sequence and $(\boldsymbol{v},\boldsymbol{W},\boldsymbol{p})$ are trainable parameters. We prove that running gradient descent on $\boldsymbol{p}$, or equivalently $\boldsymbol{W}$, converges in direction to a max-margin solution that separates $\textit{locally-optimal}$ tokens from non-optimal ones. This clearly formalizes attention as an optimal token selection mechanism. Remarkably, our results are applicable to general data and precisely characterize $\textit{optimality}$ of tokens in terms of the value embeddings $\boldsymbol{Xv}$ and problem geometry. We also provide a broader regularization path analysis that establishes the margin maximizing nature of attention even for nonlinear prediction heads. When optimizing $\boldsymbol{v}$ and $\boldsymbol{p}$ simultaneously with logistic loss, we identify conditions under which the regularization paths directionally converge to their respective hard-margin SVM solutions where $\boldsymbol{v}$ separates the input features based on their labels. Interestingly, the SVM formulation of $\boldsymbol{p}$ is influenced by the support vector geometry of $\boldsymbol{v}$. Finally, we verify our theoretical findings via numerical experiments and provide insights.
    Gradient-Based Feature Learning under Structured Data. (arXiv:2309.03843v1 [stat.ML])
    Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
    Accelerating Numerical Solvers for Large-Scale Simulation of Dynamical System via NeurVec. (arXiv:2208.03680v2 [cs.CE] UPDATED)
    The large-scale simulation of dynamical systems is critical in numerous scientific and engineering disciplines. However, traditional numerical solvers are limited by the choice of step sizes when estimating integration, resulting in a trade-off between accuracy and computational efficiency. To address this challenge, we introduce a deep learning-based corrector called Neural Vector (NeurVec), which can compensate for integration errors and enable larger time step sizes in simulations. Our extensive experiments on a variety of complex dynamical system benchmarks demonstrate that NeurVec exhibits remarkable generalization capability on a continuous phase space, even when trained using limited and discrete data. NeurVec significantly accelerates traditional solvers, achieving speeds tens to hundreds of times faster while maintaining high levels of accuracy and stability. Moreover, NeurVec's simple-yet-effective design, combined with its ease of implementation, has the potential to establish a new paradigm for fast-solving differential equations based on deep learning.
    Convergence Analysis of Decentralized ASGD. (arXiv:2309.03754v1 [cs.LG])
    Over the last decades, Stochastic Gradient Descent (SGD) has been intensively studied by the Machine Learning community. Despite its versatility and excellent performance, the optimization of large models via SGD still is a time-consuming task. To reduce training time, it is common to distribute the training process across multiple devices. Recently, it has been shown that the convergence of asynchronous SGD (ASGD) will always be faster than mini-batch SGD. However, despite these improvements in the theoretical bounds, most ASGD convergence-rate proofs still rely on a centralized parameter server, which is prone to become a bottleneck when scaling out the gradient computations across many distributed processes. In this paper, we present a novel convergence-rate analysis for decentralized and asynchronous SGD (DASGD) which does not require partial synchronization among nodes nor restrictive network topologies. Specifically, we provide a bound of $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(QS_{avg}\epsilon^{-3/2}) + \mathcal{O}(S_{avg}\epsilon^{-1})$ for the convergence rate of DASGD, where $S_{avg}$ is the average staleness between models, $Q$ is a constant that bounds the norm of the gradients, and $\epsilon$ is a (small) error that is allowed within the bound. Furthermore, when gradients are not bounded, we prove the convergence rate of DASGD to be $\mathcal{O}(\sigma\epsilon^{-2}) + \mathcal{O}(\sqrt{\hat{S}_{avg}\hat{S}_{max}}\epsilon^{-1})$, with $\hat{S}_{max}$ and $\hat{S}_{avg}$ representing a loose version of the average and maximum staleness, respectively. Our convergence proof holds for a fixed stepsize and any non-convex, homogeneous, and L-smooth objective function. We anticipate that our results will be of high relevance for the adoption of DASGD by a broad community of researchers and developers.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v3 [cs.LG] UPDATED)
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $4N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.
    DoLa: Decoding by Contrasting Layers Improves Factuality in Large Language Models. (arXiv:2309.03883v1 [cs.CL])
    Despite their impressive capabilities, large language models (LLMs) are prone to hallucinations, i.e., generating content that deviates from facts seen during pretraining. We propose a simple decoding strategy for reducing hallucinations with pretrained LLMs that does not require conditioning on retrieved external knowledge nor additional fine-tuning. Our approach obtains the next-token distribution by contrasting the differences in logits obtained from projecting the later layers versus earlier layers to the vocabulary space, exploiting the fact that factual knowledge in an LLMs has generally been shown to be localized to particular transformer layers. We find that this Decoding by Contrasting Layers (DoLa) approach is able to better surface factual knowledge and reduce the generation of incorrect facts. DoLa consistently improves the truthfulness across multiple choices tasks and open-ended generation tasks, for example improving the performance of LLaMA family models on TruthfulQA by 12-17% absolute points, demonstrating its potential in making LLMs reliably generate truthful facts.
    Limitation of Characterizing Implicit Regularization by Data-independent Functions. (arXiv:2201.12198v2 [cs.LG] UPDATED)
    In recent years, understanding the implicit regularization of neural networks (NNs) has become a central task in deep learning theory. However, implicit regularization is itself not completely defined and well understood. In this work, we attempt to mathematically define and study implicit regularization. Importantly, we explore the limitations of a common approach to characterizing implicit regularization using data-independent functions. We propose two dynamical mechanisms, i.e., Two-point and One-point Overlapping mechanisms, based on which we provide two recipes for producing classes of one-hidden-neuron NNs that provably cannot be fully characterized by a type of or all data-independent functions. Following the previous works, our results further emphasize the profound data dependency of implicit regularization in general, inspiring us to study in detail the data dependency of NN implicit regularization in the future.
    Ensemble linear interpolators: The role of ensembling. (arXiv:2309.03354v1 [stat.ML])
    Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    On Root Cause Localization and Anomaly Mitigation through Causal Inference. (arXiv:2212.04031v2 [cs.LG] UPDATED)
    Due to a wide spectrum of applications in the real world, such as security, financial surveillance, and health risk, various deep anomaly detection models have been proposed and achieved state-of-the-art performance. However, besides being effective, in practice, the practitioners would further like to know what causes the abnormal outcome and how to further fix it. In this work, we propose RootCLAM, which aims to achieve Root Cause Localization and Anomaly Mitigation from a causal perspective. Especially, we formulate anomalies caused by external interventions on the normal causal mechanism and aim to locate the abnormal features with external interventions as root causes. After that, we further propose an anomaly mitigation approach that aims to recommend mitigation actions on abnormal features to revert the abnormal outcomes such that the counterfactuals guided by the causal mechanism are normal. Experiments on three datasets show that our approach can locate the root causes and further flip the abnormal labels.
    Alzheimer Disease Detection from Raman Spectroscopy of the Cerebrospinal Fluid via Topological Machine Learning. (arXiv:2309.03664v1 [cs.LG])
    The cerebrospinal fluid (CSF) of 19 subjects who received a clinical diagnosis of Alzheimer's disease (AD) as well as of 5 pathological controls have been collected and analysed by Raman spectroscopy (RS). We investigated whether the raw and preprocessed Raman spectra could be used to distinguish AD from controls. First, we applied standard Machine Learning (ML) methods obtaining unsatisfactory results. Then, we applied ML to a set of topological descriptors extracted from raw spectra, achieving a very good classification accuracy (>87%). Although our results are preliminary, they indicate that RS and topological analysis together may provide an effective combination to confirm or disprove a clinical diagnosis of AD. The next steps will include enlarging the dataset of CSF samples to validate the proposed method better and, possibly, to understand if topological data analysis could support the characterization of AD subtypes.
    Prime and Modulate Learning: Generation of forward models with signed back-propagation and environmental cues. (arXiv:2309.03825v1 [cs.LG])
    Deep neural networks employing error back-propagation for learning can suffer from exploding and vanishing gradient problems. Numerous solutions have been proposed such as normalisation techniques or limiting activation functions to linear rectifying units. In this work we follow a different approach which is particularly applicable to closed-loop learning of forward models where back-propagation makes exclusive use of the sign of the error signal to prime the learning, whilst a global relevance signal modulates the rate of learning. This is inspired by the interaction between local plasticity and a global neuromodulation. For example, whilst driving on an empty road, one can allow for slow step-wise optimisation of actions, whereas, at a busy junction, an error must be corrected at once. Hence, the error is the priming signal and the intensity of the experience is a modulating factor in the weight change. The advantages of this Prime and Modulate paradigm is twofold: it is free from normalisation and it makes use of relevant cues from the environment to enrich the learning. We present a mathematical derivation of the learning rule in z-space and demonstrate the real-time performance with a robotic platform. The results show a significant improvement in the speed of convergence compared to that of the conventional back-propagation.
    Copula Representations and Error Surface Projections for the Exclusive Or Problem. (arXiv:1907.04483v2 [cs.LG] UPDATED)
    The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon.
    Towards provably efficient quantum algorithms for large-scale machine-learning models. (arXiv:2303.03428v4 [quant-ph] UPDATED)
    Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge computational expenses, power, and time used both in the pre-training and fine-tuning process. In this work, we show that fault-tolerant quantum computing could possibly provide provably efficient resolutions for generic (stochastic) gradient descent algorithms, scaling as $\mathcal{O}(T^2 \times \text{polylog}(n))$, where $n$ is the size of the models and $T$ is the number of iterations in the training, as long as the models are both sufficiently dissipative and sparse, with small learning rates. Based on earlier efficient quantum algorithms for dissipative differential equations, we find and prove that similar algorithms work for (stochastic) gradient descent, the primary algorithm for machine learning. In practice, we benchmark instances of large machine learning models from 7 million to 103 million parameters. We find that, in the context of sparse training, a quantum enhancement is possible at the early stage of learning after model pruning, motivating a sparse parameter download and re-upload scheme. Our work shows solidly that fault-tolerant quantum algorithms could potentially contribute to most state-of-the-art, large-scale machine-learning problems.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v3 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Blended-NeRF: Zero-Shot Object Generation and Blending in Existing Neural Radiance Fields. (arXiv:2306.12760v2 [cs.CV] UPDATED)
    Editing a local region or a specific object in a 3D scene represented by a NeRF or consistently blending a new realistic object into the scene is challenging, mainly due to the implicit nature of the scene representation. We present Blended-NeRF, a robust and flexible framework for editing a specific region of interest in an existing NeRF scene, based on text prompts, along with a 3D ROI box. Our method leverages a pretrained language-image model to steer the synthesis towards a user-provided text prompt, along with a 3D MLP model initialized on an existing NeRF scene to generate the object and blend it into a specified region in the original scene. We allow local editing by localizing a 3D ROI box in the input scene, and blend the content synthesized inside the ROI with the existing scene using a novel volumetric blending technique. To obtain natural looking and view-consistent results, we leverage existing and new geometric priors and 3D augmentations for improving the visual fidelity of the final result. We test our framework both qualitatively and quantitatively on a variety of real 3D scenes and text prompts, demonstrating realistic multi-view consistent results with much flexibility and diversity compared to the baselines. Finally, we show the applicability of our framework for several 3D editing applications, including adding new objects to a scene, removing/replacing/altering existing objects, and texture conversion.
    Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models. (arXiv:2307.14971v2 [cs.CV] UPDATED)
    With the overwhelming trend of mask image modeling led by MAE, generative pre-training has shown a remarkable potential to boost the performance of fundamental models in 2D vision. However, in 3D vision, the over-reliance on Transformer-based backbones and the unordered nature of point clouds have restricted the further development of generative pre-training. In this paper, we propose a novel 3D-to-2D generative pre-training method that is adaptable to any point cloud model. We propose to generate view images from different instructed poses via the cross-attention mechanism as the pre-training scheme. Generating view images has more precise supervision than its point cloud counterpart, thus assisting 3D backbones to have a finer comprehension of the geometrical structure and stereoscopic relations of the point cloud. Experimental results have proved the superiority of our proposed 3D-to-2D generative pre-training over previous pre-training methods. Our method is also effective in boosting the performance of architecture-oriented approaches, achieving state-of-the-art performance when fine-tuning on ScanObjectNN classification and ShapeNetPart segmentation tasks. Code is available at https://github.com/wangzy22/TAP.
    Bootstrapping Adaptive Human-Machine Interfaces with Offline Reinforcement Learning. (arXiv:2309.03839v1 [cs.RO])
    Adaptive interfaces can help users perform sequential decision-making tasks like robotic teleoperation given noisy, high-dimensional command signals (e.g., from a brain-computer interface). Recent advances in human-in-the-loop machine learning enable such systems to improve by interacting with users, but tend to be limited by the amount of data that they can collect from individual users in practice. In this paper, we propose a reinforcement learning algorithm to address this by training an interface to map raw command signals to actions using a combination of offline pre-training and online fine-tuning. To address the challenges posed by noisy command signals and sparse rewards, we develop a novel method for representing and inferring the user's long-term intent for a given trajectory. We primarily evaluate our method's ability to assist users who can only communicate through noisy, high-dimensional input channels through a user study in which 12 participants performed a simulated navigation task by using their eye gaze to modulate a 128-dimensional command signal from their webcam. The results show that our method enables successful goal navigation more often than a baseline directional interface, by learning to denoise user commands signals and provide shared autonomy assistance. We further evaluate on a simulated Sawyer pushing task with eye gaze control, and the Lunar Lander game with simulated user commands, and find that our method improves over baseline interfaces in these domains as well. Extensive ablation experiments with simulated user commands empirically motivate each component of our method.
    AnthroNet: Conditional Generation of Humans via Anthropometrics. (arXiv:2309.03812v1 [cs.CV])
    We present a novel human body model formulated by an extensive set of anthropocentric measurements, which is capable of generating a wide range of human body shapes and poses. The proposed model enables direct modeling of specific human identities through a deep generative architecture, which can produce humans in any arbitrary pose. It is the first of its kind to have been trained end-to-end using only synthetically generated data, which not only provides highly accurate human mesh representations but also allows for precise anthropometry of the body. Moreover, using a highly diverse animation library, we articulated our synthetic humans' body and hands to maximize the diversity of the learnable priors for model training. Our model was trained on a dataset of $100k$ procedurally-generated posed human meshes and their corresponding anthropometric measurements. Our synthetic data generator can be used to generate millions of unique human identities and poses for non-commercial academic research purposes.
    ReFit: A Framework for Refinement of Weakly Supervised Semantic Segmentation using Object Border Fitting for Medical Images. (arXiv:2303.07853v2 [cs.CV] UPDATED)
    Weakly Supervised Semantic Segmentation (WSSS) relying only on image-level supervision is a promising approach to deal with the need for Segmentation networks, especially for generating a large number of pixel-wise masks in a given dataset. However, most state-of-the-art image-level WSSS techniques lack an understanding of the geometric features embedded in the images since the network cannot derive any object boundary information from just image-level labels. We define a boundary here as the line separating an object and its background, or two different objects. To address this drawback, we are proposing our novel ReFit framework, which deploys state-of-the-art class activation maps combined with various post-processing techniques in order to achieve fine-grained higher-accuracy segmentation masks. To achieve this, we investigate a state-of-the-art unsupervised segmentation network that can be used to construct a boundary map, which enables ReFit to predict object locations with sharper boundaries. By applying our method to WSSS predictions, we achieved up to 10% improvement over the current state-of-the-art WSSS methods for medical imaging. The framework is open-source, to ensure that our results are reproducible, and accessible online at https://github.com/bharathprabakaran/ReFit.
    Adversarial Likelihood Estimation With One-Way Flows. (arXiv:2307.09882v2 [cs.LG] UPDATED)
    Generative Adversarial Networks (GANs) can produce high-quality samples, but do not provide an estimate of the probability density around the samples. However, it has been noted that maximizing the log-likelihood within an energy-based setting can lead to an adversarial framework where the discriminator provides unnormalized density (often called energy). We further develop this perspective, incorporate importance sampling, and show that 1) Wasserstein GAN performs a biased estimate of the partition function, and we propose instead to use an unbiased estimator; and 2) when optimizing for likelihood, one must maximize generator entropy. This is hypothesized to provide a better mode coverage. Different from previous works, we explicitly compute the density of the generated samples. This is the key enabler to designing an unbiased estimator of the partition function and computation of the generator entropy term. The generator density is obtained via a new type of flow network, called one-way flow network, that is less constrained in terms of architecture, as it does not require a tractable inverse function. Our experimental results show that our method converges faster, produces comparable sample quality to GANs with similar architecture, successfully avoids over-fitting to commonly used datasets and produces smooth low-dimensional latent representations of the training data.
    Pure Exploration in Bandits with Linear Constraints. (arXiv:2306.12774v2 [cs.LG] UPDATED)
    We address the problem of identifying the optimal policy with a fixed confidence level in a multi-armed bandit setup, when \emph{the arms are subject to linear constraints}. Unlike the standard best-arm identification problem which is well studied, the optimal policy in this case may not be deterministic and could mix between several arms. This changes the geometry of the problem which we characterize via an information-theoretic lower bound. We introduce two asymptotically optimal algorithms for this setting, one based on the Track-and-Stop method and the other based on a game-theoretic approach. Both these algorithms try to track an optimal allocation based on the lower bound and computed by a weighted projection onto the boundary of a normal cone. Finally, we provide empirical results that validate our bounds and visualize how constraints change the hardness of the problem.
    Better Practices for Domain Adaptation. (arXiv:2309.03879v1 [cs.LG])
    Distribution shifts are all too common in real-world applications of machine learning. Domain adaptation (DA) aims to address this by providing various frameworks for adapting models to the deployment data without using labels. However, the domain shift scenario raises a second more subtle challenge: the difficulty of performing hyperparameter optimisation (HPO) for these adaptation algorithms without access to a labelled validation set. The unclear validation protocol for DA has led to bad practices in the literature, such as performing HPO using the target test labels when, in real-world scenarios, they are not available. This has resulted in over-optimism about DA research progress compared to reality. In this paper, we analyse the state of DA when using good evaluation practice, by benchmarking a suite of candidate validation criteria and using them to assess popular adaptation algorithms. We show that there are challenges across all three branches of domain adaptation methodology including Unsupervised Domain Adaptation (UDA), Source-Free Domain Adaptation (SFDA), and Test Time Adaptation (TTA). While the results show that realistically achievable performance is often worse than expected, they also show that using proper validation splits is beneficial, as well as showing that some previously unexplored validation metrics provide the best options to date. Altogether, our improved practices covering data, training, validation and hyperparameter optimisation form a new rigorous pipeline to improve benchmarking, and hence research progress, within this important field going forward.
    Dataset Generation and Bonobo Classification from Weakly Labelled Videos. (arXiv:2309.03671v1 [cs.CV])
    This paper presents a bonobo detection and classification pipeline built from the commonly used machine learning methods. Such application is motivated by the need to test bonobos in their enclosure using touch screen devices without human assistance. This work introduces a newly acquired dataset based on bonobo recordings generated semi-automatically. The recordings are weakly labelled and fed to a macaque detector in order to spatially detect the individual present in the video. Handcrafted features coupled with different classification algorithms and deep-learning methods using a ResNet architecture are investigated for bonobo identification. Performance is compared in terms of classification accuracy on the splits of the database using different data separation methods. We demonstrate the importance of data preparation and how a wrong data separation can lead to false good results. Finally, after a meaningful separation of the data, the best classification performance is obtained using a fine-tuned ResNet model and reaches 75% of accuracy.
    A computationally lightweight safe learning algorithm. (arXiv:2309.03672v1 [eess.SY])
    Safety is an essential asset when learning control policies for physical systems, as violating safety constraints during training can lead to expensive hardware damage. In response to this need, the field of safe learning has emerged with algorithms that can provide probabilistic safety guarantees without knowledge of the underlying system dynamics. Those algorithms often rely on Gaussian process inference. Unfortunately, Gaussian process inference scales cubically with the number of data points, limiting applicability to high-dimensional and embedded systems. In this paper, we propose a safe learning algorithm that provides probabilistic safety guarantees but leverages the Nadaraya-Watson estimator instead of Gaussian processes. For the Nadaraya-Watson estimator, we can reach logarithmic scaling with the number of data points. We provide theoretical guarantees for the estimates, embed them into a safe learning algorithm, and show numerical experiments on a simulated seven-degrees-of-freedom robot manipulator.
    Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck. (arXiv:2309.03800v1 [cs.LG])
    This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
    CPU frequency scheduling of real-time applications on embedded devices with temporal encoding-based deep reinforcement learning. (arXiv:2309.03779v1 [cs.LG])
    Small devices are frequently used in IoT and smart-city applications to perform periodic dedicated tasks with soft deadlines. This work focuses on developing methods to derive efficient power-management methods for periodic tasks on small devices. We first study the limitations of the existing Linux built-in methods used in small devices. We illustrate three typical workload/system patterns that are challenging to manage with Linux's built-in solutions. We develop a reinforcement-learning-based technique with temporal encoding to derive an effective DVFS governor even with the presence of the three system patterns. The derived governor uses only one performance counter, the same as the built-in Linux mechanism, and does not require an explicit task model for the workload. We implemented a prototype system on the Nvidia Jetson Nano Board and experimented with it with six applications, including two self-designed and four benchmark applications. Under different deadline constraints, our approach can quickly derive a DVFS governor that can adapt to performance requirements and outperform the built-in Linux approach in energy saving. On Mibench workloads, with performance slack ranging from 0.04 s to 0.4 s, the proposed method can save 3% - 11% more energy compared to Ondemand. AudioReg and FaceReg applications tested have 5%- 14% energy-saving improvement. We have open-sourced the implementation of our in-kernel quantized neural network engine. The codebase can be found at: https://github.com/coladog/tinyagent.
    Your Battery Is a Blast! Safeguarding Against Counterfeit Batteries with Authentication. (arXiv:2309.03607v1 [cs.CR])
    Lithium-ion (Li-ion) batteries are the primary power source in various applications due to their high energy and power density. Their market was estimated to be up to 48 billion U.S. dollars in 2022. However, the widespread adoption of Li-ion batteries has resulted in counterfeit cell production, which can pose safety hazards to users. Counterfeit cells can cause explosions or fires, and their prevalence in the market makes it difficult for users to detect fake cells. Indeed, current battery authentication methods can be susceptible to advanced counterfeiting techniques and are often not adaptable to various cells and systems. In this paper, we improve the state of the art on battery authentication by proposing two novel methodologies, DCAuth and EISthentication, which leverage the internal characteristics of each cell through Machine Learning models. Our methods automatically authenticate lithium-ion battery models and architectures using data from their regular usage without the need for any external device. They are also resilient to the most common and critical counterfeit practices and can scale to several batteries and devices. To evaluate the effectiveness of our proposed methodologies, we analyze time-series data from a total of 20 datasets that we have processed to extract meaningful features for our analysis. Our methods achieve high accuracy in battery authentication for both architectures (up to 0.99) and models (up to 0.96). Moreover, our methods offer comparable identification performances. By using our proposed methodologies, manufacturers can ensure that devices only use legitimate batteries, guaranteeing the operational state of any system and safety measures for the users.
    Temporal Inductive Path Neural Network for Temporal Knowledge Graph Reasoning. (arXiv:2309.03251v1 [cs.AI])
    Temporal Knowledge Graph (TKG) is an extension of traditional Knowledge Graph (KG) that incorporates the dimension of time. Reasoning on TKGs is a crucial task that aims to predict future facts based on historical occurrences. The key challenge lies in uncovering structural dependencies within historical subgraphs and temporal patterns. Most existing approaches model TKGs relying on entity modeling, as nodes in the graph play a crucial role in knowledge representation. However, the real-world scenario often involves an extensive number of entities, with new entities emerging over time. This makes it challenging for entity-dependent methods to cope with extensive volumes of entities, and effectively handling newly emerging entities also becomes a significant challenge. Therefore, we propose Temporal Inductive Path Neural Network (TiPNN), which models historical information in an entity-independent perspective. Specifically, TiPNN adopts a unified graph, namely history temporal graph, to comprehensively capture and encapsulate information from history. Subsequently, we utilize the defined query-aware temporal paths to model historical path information related to queries on history temporal graph for the reasoning. Extensive experiments illustrate that the proposed model not only attains significant performance enhancements but also handles inductive settings, while additionally facilitating the provision of reasoning evidence through history temporal graphs.
    Feature Enhancer Segmentation Network (FES-Net) for Vessel Segmentation. (arXiv:2309.03535v1 [eess.IV])
    Diseases such as diabetic retinopathy and age-related macular degeneration pose a significant risk to vision, highlighting the importance of precise segmentation of retinal vessels for the tracking and diagnosis of progression. However, existing vessel segmentation methods that heavily rely on encoder-decoder structures struggle to capture contextual information about retinal vessel configurations, leading to challenges in reconciling semantic disparities between encoder and decoder features. To address this, we propose a novel feature enhancement segmentation network (FES-Net) that achieves accurate pixel-wise segmentation without requiring additional image enhancement steps. FES-Net directly processes the input image and utilizes four prompt convolutional blocks (PCBs) during downsampling, complemented by a shallow upsampling approach to generate a binary mask for each class. We evaluate the performance of FES-Net on four publicly available state-of-the-art datasets: DRIVE, STARE, CHASE, and HRF. The evaluation results clearly demonstrate the superior performance of FES-Net compared to other competitive approaches documented in the existing literature.
    A Majority Invariant Approach to Patch Robustness Certification for Deep Learning Models. (arXiv:2308.00452v2 [cs.LG] UPDATED)
    Patch robustness certification ensures no patch within a given bound on a sample can manipulate a deep learning model to predict a different label. However, existing techniques cannot certify samples that cannot meet their strict bars at the classifier or patch region levels. This paper proposes MajorCert. MajorCert firstly finds all possible label sets manipulatable by the same patch region on the same sample across the underlying classifiers, then enumerates their combinations element-wise, and finally checks whether the majority invariant of all these combinations is intact to certify samples.
    Fast FixMatch: Faster Semi-Supervised Learning with Curriculum Batch Size. (arXiv:2309.03469v1 [cs.LG])
    Advances in Semi-Supervised Learning (SSL) have almost entirely closed the gap between SSL and Supervised Learning at a fraction of the number of labels. However, recent performance improvements have often come \textit{at the cost of significantly increased training computation}. To address this, we propose Curriculum Batch Size (CBS), \textit{an unlabeled batch size curriculum which exploits the natural training dynamics of deep neural networks.} A small unlabeled batch size is used in the beginning of training and is gradually increased to the end of training. A fixed curriculum is used regardless of dataset, model or number of epochs, and reduced training computations is demonstrated on all settings. We apply CBS, strong labeled augmentation, Curriculum Pseudo Labeling (CPL) \citep{FlexMatch} to FixMatch \citep{FixMatch} and term the new SSL algorithm Fast FixMatch. We perform an ablation study to show that strong labeled augmentation and/or CPL do not significantly reduce training computations, but, in synergy with CBS, they achieve optimal performance. Fast FixMatch also achieves substantially higher data utilization compared to previous state-of-the-art. Fast FixMatch achieves between $2.1\times$ - $3.4\times$ reduced training computations on CIFAR-10 with all but 40, 250 and 4000 labels removed, compared to vanilla FixMatch, while attaining the same cited state-of-the-art error rate \citep{FixMatch}. Similar results are achieved for CIFAR-100, SVHN and STL-10. Finally, Fast MixMatch achieves between $2.6\times$ - $3.3\times$ reduced training computations in federated SSL tasks and online/streaming learning SSL tasks, which further demonstrate the generializbility of Fast MixMatch to different scenarios and tasks.
    DTW+S: Shape-based Comparison of Time-series with Ordered Local Trend. (arXiv:2309.03579v1 [cs.LG])
    Measuring distance or similarity between time-series data is a fundamental aspect of many applications including classification and clustering. Existing measures may fail to capture similarities due to local trends (shapes) and may even produce misleading results. Our goal is to develop a measure that looks for similar trends occurring around similar times and is easily interpretable for researchers in applied domains. This is particularly useful for applications where time-series have a sequence of meaningful local trends that are ordered, such as in epidemics (a surge to an increase to a peak to a decrease). We propose a novel measure, DTW+S, which creates an interpretable "closeness-preserving" matrix representation of the time-series, where each column represents local trends, and then it applies Dynamic Time Warping to compute distances between these matrices. We present a theoretical analysis that supports the choice of this representation. We demonstrate the utility of DTW+S in ensemble building and clustering of epidemic curves. We also demonstrate that our approach results in better classification compared to Dynamic Time Warping for a class of datasets, particularly when local trends rather than scale play a decisive role.
    GraPhSyM: Graph Physical Synthesis Model. (arXiv:2308.03944v2 [cs.LG] UPDATED)
    In this work, we introduce GraPhSyM, a Graph Attention Network (GATv2) model for fast and accurate estimation of post-physical synthesis circuit delay and area metrics from pre-physical synthesis circuit netlists. Once trained, GraPhSyM provides accurate visibility of final design metrics to early EDA stages, such as logic synthesis, without running the slow physical synthesis flow, enabling global co-optimization across stages. Additionally, the swift and precise feedback provided by GraPhSyM is instrumental for machine-learning-based EDA optimization frameworks. Given a gate-level netlist of a circuit represented as a graph, GraPhSyM utilizes graph structure, connectivity, and electrical property features to predict the impact of physical synthesis transformations such as buffer insertion and gate sizing. When trained on a dataset of 6000 prefix adder designs synthesized at an aggressive delay target, GraPhSyM can accurately predict the post-synthesis delay (98.3%) and area (96.1%) metrics of unseen adders with a fast 0.22s inference time. Furthermore, we illustrate the compositionality of GraPhSyM by employing the model trained on a fixed delay target to accurately anticipate post-synthesis metrics at a variety of unseen delay targets. Lastly, we report promising generalization capabilities of the GraPhSyM model when it is evaluated on circuits different from the adders it was exclusively trained on. The results show the potential for GraPhSyM to serve as a powerful tool for advanced optimization techniques and as an oracle for EDA machine learning frameworks.
    A State Representation for Diminishing Rewards. (arXiv:2309.03710v1 [cs.LG])
    A common setting in multitask reinforcement learning (RL) demands that an agent rapidly adapt to various stationary reward functions randomly sampled from a fixed distribution. In such situations, the successor representation (SR) is a popular framework which supports rapid policy evaluation by decoupling a policy's expected discounted, cumulative state occupancies from a specific reward function. However, in the natural world, sequential tasks are rarely independent, and instead reflect shifting priorities based on the availability and subjective perception of rewarding stimuli. Reflecting this disjunction, in this paper we study the phenomenon of diminishing marginal utility and introduce a novel state representation, the $\lambda$ representation ($\lambda$R) which, surprisingly, is required for policy evaluation in this setting and which generalizes the SR as well as several other state representations from the literature. We establish the $\lambda$R's formal properties and examine its normative advantages in the context of machine learning, as well as its usefulness for studying natural behaviors, particularly foraging.
    DeepAD: A Robust Deep Learning Model of Alzheimer's Disease Progression for Real-World Clinical Applications. (arXiv:2203.09096v5 [cs.LG] UPDATED)
    The ability to predict the future trajectory of a patient is a key step toward the development of therapeutics for complex diseases such as Alzheimer's disease (AD). However, most machine learning approaches developed for prediction of disease progression are either single-task or single-modality models, which can not be directly adopted to our setting involving multi-task learning with high dimensional images. Moreover, most of those approaches are trained on a single dataset (i.e. cohort), which can not be generalized to other cohorts. We propose a novel multimodal multi-task deep learning model to predict AD progression by analyzing longitudinal clinical and neuroimaging data from multiple cohorts. Our proposed model integrates high dimensional MRI features from a 3D convolutional neural network with other data modalities, including clinical and demographic information, to predict the future trajectory of patients. Our model employs an adversarial loss to alleviate the study-specific imaging bias, in particular the inter-study domain shifts. In addition, a Sharpness-Aware Minimization (SAM) optimization technique is applied to further improve model generalization. The proposed model is trained and tested on various datasets in order to evaluate and validate the results. Our results showed that 1) our model yields significant improvement over the baseline models, and 2) models using extracted neuroimaging features from 3D convolutional neural network outperform the same models when applied to MRI-derived volumetric features.
    Scalable Learning of Intrusion Responses through Recursive Decomposition. (arXiv:2309.03292v1 [eess.SY])
    We study automated intrusion response for an IT infrastructure and formulate the interaction between an attacker and a defender as a partially observed stochastic game. To solve the game we follow an approach where attack and defense strategies co-evolve through reinforcement learning and self-play toward an equilibrium. Solutions proposed in previous work prove the feasibility of this approach for small infrastructures but do not scale to realistic scenarios due to the exponential growth in computational complexity with the infrastructure size. We address this problem by introducing a method that recursively decomposes the game into subgames which can be solved in parallel. Applying optimal stopping theory we show that the best response strategies in these subgames exhibit threshold structures, which allows us to compute them efficiently. To solve the decomposed game we introduce an algorithm called Decompositional Fictitious Self-Play (DFSP), which learns Nash equilibria through stochastic approximation. We evaluate the learned strategies in an emulation environment where real intrusions and response actions can be executed. The results show that the learned strategies approximate an equilibrium and that DFSP significantly outperforms a state-of-the-art algorithm for a realistic infrastructure configuration.
    Implicit Design Choices and Their Impact on Emotion Recognition Model Development and Evaluation. (arXiv:2309.03238v1 [cs.LG])
    Emotion recognition is a complex task due to the inherent subjectivity in both the perception and production of emotions. The subjectivity of emotions poses significant challenges in developing accurate and robust computational models. This thesis examines critical facets of emotion recognition, beginning with the collection of diverse datasets that account for psychological factors in emotion production. To handle the challenge of non-representative training data, this work collects the Multimodal Stressed Emotion dataset, which introduces controlled stressors during data collection to better represent real-world influences on emotion production. To address issues with label subjectivity, this research comprehensively analyzes how data augmentation techniques and annotation schemes impact emotion perception and annotator labels. It further handles natural confounding variables and variations by employing adversarial networks to isolate key factors like stress from learned emotion representations during model training. For tackling concerns about leakage of sensitive demographic variables, this work leverages adversarial learning to strip sensitive demographic information from multimodal encodings. Additionally, it proposes optimized sociological evaluation metrics aligned with cost-effective, real-world needs for model testing. This research advances robust, practical emotion recognition through multifaceted studies of challenges in datasets, labels, modeling, demographic and membership variable encoding in representations, and evaluation. The groundwork has been laid for cost-effective, generalizable emotion recognition models that are less likely to encode sensitive demographic information.
    A Generalized Bandsplit Neural Network for Cinematic Audio Source Separation. (arXiv:2309.02539v2 [eess.AS] UPDATED)
    Cinematic audio source separation is a relatively new subtask of audio source separation, with the aim of extracting the dialogue stem, the music stem, and the effects stem from their mixture. In this work, we developed a model generalizing the Bandsplit RNN for any complete or overcomplete partitions of the frequency axis. Psycho-acoustically motivated frequency scales were used to inform the band definitions which are now defined with redundancy for more reliable feature extraction. A loss function motivated by the signal-to-noise ratio and the sparsity-promoting property of the 1-norm was proposed. We additionally exploit the information-sharing property of a common-encoder setup to reduce computational complexity during both training and inference, improve separation performance for hard-to-generalize classes of sounds, and allow flexibility during inference time with easily detachable decoders. Our best model sets the state of the art on the Divide and Remaster dataset with performance above the ideal ratio mask for the dialogue stem.
    CenTime: Event-Conditional Modelling of Censoring in Survival Analysis. (arXiv:2309.03851v1 [cs.LG])
    Survival analysis is a valuable tool for estimating the time until specific events, such as death or cancer recurrence, based on baseline observations. This is particularly useful in healthcare to prognostically predict clinically important events based on patient data. However, existing approaches often have limitations; some focus only on ranking patients by survivability, neglecting to estimate the actual event time, while others treat the problem as a classification task, ignoring the inherent time-ordered structure of the events. Furthermore, the effective utilization of censored samples - training data points where the exact event time is unknown - is essential for improving the predictive accuracy of the model. In this paper, we introduce CenTime, a novel approach to survival analysis that directly estimates the time to event. Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce. We demonstrate that our approach forms a consistent estimator for the event model parameters, even in the absence of uncensored data. Furthermore, CenTime is easily integrated with deep learning models with no restrictions on batch size or the number of uncensored samples. We compare our approach with standard survival analysis methods, including the Cox proportional-hazard model and DeepHit. Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance. Our implementation is publicly available at https://github.com/ahmedhshahin/CenTime.
    VLUCI: Variational Learning of Unobserved Confounders for Counterfactual Inference. (arXiv:2308.00904v2 [cs.LG] UPDATED)
    Causal inference plays a vital role in diverse domains like epidemiology, healthcare, and economics. De-confounding and counterfactual prediction in observational data has emerged as a prominent concern in causal inference research. While existing models tackle observed confounders, the presence of unobserved confounders remains a significant challenge, distorting causal inference and impacting counterfactual outcome accuracy. To address this, we propose a novel variational learning model of unobserved confounders for counterfactual inference (VLUCI), which generates the posterior distribution of unobserved confounders. VLUCI relaxes the unconfoundedness assumption often overlooked by most causal inference methods. By disentangling observed and unobserved confounders, VLUCI constructs a doubly variational inference model to approximate the distribution of unobserved confounders, which are used for inferring more accurate counterfactual outcomes. Extensive experiments on synthetic and semi-synthetic datasets demonstrate VLUCI's superior performance in inferring unobserved confounders. It is compatible with state-of-the-art counterfactual inference models, significantly improving inference accuracy at both group and individual levels. Additionally, VLUCI provides confidence intervals for counterfactual outcomes, aiding decision-making in risk-sensitive domains. We further clarify the considerations when applying VLUCI to cases where unobserved confounders don't strictly conform to our model assumptions using the public IHDP dataset as an example, highlighting the practical advantages of VLUCI.
    DiFaReli: Diffusion Face Relighting. (arXiv:2304.09479v3 [cs.CV] UPDATED)
    We present a novel approach to single-view face relighting in the wild. Handling non-diffuse effects, such as global illumination or cast shadows, has long been a challenge in face relighting. Prior work often assumes Lambertian surfaces, simplified lighting models or involves estimating 3D shape, albedo, or a shadow map. This estimation, however, is error-prone and requires many training examples with lighting ground truth to generalize well. Our work bypasses the need for accurate estimation of intrinsic components and can be trained solely on 2D images without any light stage data, multi-view images, or lighting ground truth. Our key idea is to leverage a conditional diffusion implicit model (DDIM) for decoding a disentangled light encoding along with other encodings related to 3D shape and facial identity inferred from off-the-shelf estimators. We also propose a novel conditioning technique that eases the modeling of the complex interaction between light and geometry by using a rendered shading reference to spatially modulate the DDIM. We achieve state-of-the-art performance on standard benchmark Multi-PIE and can photorealistically relight in-the-wild images. Please visit our page: https://diffusion-face-relighting.github.io
    Examining the Effectiveness of Chatbots in Gathering Family History Information in Comparison to the Standard In-Person Interview-Based Approach. (arXiv:2309.03223v1 [cs.HC])
    One of the most common things that a genealogist is tasked with is the gathering of a person's initial family history, normally via in-person interviews or with the use of a platform such as ancestry.com, as this can provide a strong foundation upon which a genealogist may build. However, the ability to conduct these interviews can often be hindered by both geographical constraints and the technical proficiency of the interviewee, as the interviewee in these types of interviews is most often an elderly person with a lower than average level of technical proficiency. With this in mind, this study presents what we believe, based on prior research, to be the first chatbot geared entirely towards the gathering of family histories, and explores the viability of utilising such a chatbot by comparing the performance and usability of such a method with the aforementioned alternatives. With a chatbot-based approach, we show that, though the average time taken to conduct an interview may be longer than if the user had used ancestry.com or participated in an in-person interview, the number of mistakes made and the level of confusion from the user regarding the UI and process required is lower than the other two methods. Note that the final metric regarding the user's confusion is not applicable for the in-person interview sessions due to its lack of a UI. With refinement, we believe this use of a chatbot could be a valuable tool for genealogists, especially when dealing with interviewees who are based in other countries where it is not possible to conduct an in-person interview.
    Neural lasso: a unifying approach of lasso and neural networks. (arXiv:2309.03770v1 [stat.ML])
    In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
    Models of human preference for learning reward functions. (arXiv:2206.02231v3 [cs.LG] UPDATED)
    The utility of reinforcement learning is limited by the alignment of reward functions with the interests of human stakeholders. One promising method for alignment is to learn the reward function from human-generated preferences between pairs of trajectory segments, a type of reinforcement learning from human feedback (RLHF). These human preferences are typically assumed to be informed solely by partial return, the sum of rewards along each segment. We find this assumption to be flawed and propose modeling human preferences instead as informed by each segment's regret, a measure of a segment's deviation from optimal decision-making. Given infinitely many preferences generated according to regret, we prove that we can identify a reward function equivalent to the reward function that generated those preferences, and we prove that the previous partial return model lacks this identifiability property in multiple contexts. We empirically show that our proposed regret preference model outperforms the partial return preference model with finite training data in otherwise the same setting. Additionally, we find that our proposed regret preference model better predicts real human preferences and also learns reward functions from these preferences that lead to policies that are better human-aligned. Overall, this work establishes that the choice of preference model is impactful, and our proposed regret preference model provides an improvement upon a core assumption of recent research. We have open sourced our experimental code, the human preferences dataset we gathered, and our training and preference elicitation interfaces for gathering a such a dataset.
    Internet Explorer: Targeted Representation Learning on the Open Web. (arXiv:2302.14051v2 [cs.LG] UPDATED)
    Modern vision models typically rely on fine-tuning general-purpose models pre-trained on large, static datasets. These general-purpose models only capture the knowledge within their pre-training datasets, which are tiny, out-of-date snapshots of the Internet -- where billions of images are uploaded each day. We suggest an alternate approach: rather than hoping our static datasets transfer to our desired tasks after large-scale pre-training, we propose dynamically utilizing the Internet to quickly train a small-scale model that does extremely well on the task at hand. Our approach, called Internet Explorer, explores the web in a self-supervised manner to progressively find relevant examples that improve performance on a desired target dataset. It cycles between searching for images on the Internet with text queries, self-supervised training on downloaded images, determining which images were useful, and prioritizing what to search for next. We evaluate Internet Explorer across several datasets and show that it outperforms or matches CLIP oracle performance by using just a single GPU desktop to actively query the Internet for 30--40 hours. Results, visualizations, and videos at https://internet-explorer-ssl.github.io/
    Off-policy Evaluation in Doubly Inhomogeneous Environments. (arXiv:2306.08719v2 [stat.ME] UPDATED)
    This work aims to study off-policy evaluation (OPE) under scenarios where two key reinforcement learning (RL) assumptions -- temporal stationarity and individual homogeneity are both violated. To handle the ``double inhomogeneities", we propose a class of latent factor models for the reward and observation transition functions, under which we develop a general OPE framework that consists of both model-based and model-free approaches. To our knowledge, this is the first paper that develops statistically sound OPE methods in offline RL with double inhomogeneities. It contributes to a deeper understanding of OPE in environments, where standard RL assumptions are not met, and provides several practical approaches in these settings. We establish the theoretical properties of the proposed value estimators and empirically show that our approach outperforms competing methods that ignore either temporal nonstationarity or individual heterogeneity. Finally, we illustrate our method on a data set from the Medical Information Mart for Intensive Care.
    Evaluation of Reinforcement Learning Techniques for Trading on a Diverse Portfolio. (arXiv:2309.03202v1 [q-fin.TR])
    This work seeks to answer key research questions regarding the viability of reinforcement learning over the S&P 500 index. The on-policy techniques of Value Iteration (VI) and State-action-reward-state-action (SARSA) are implemented along with the off-policy technique of Q-Learning. The models are trained and tested on a dataset comprising multiple years of stock market data from 2000-2023. The analysis presents the results and findings from training and testing the models using two different time periods: one including the COVID-19 pandemic years and one excluding them. The results indicate that including market data from the COVID-19 period in the training dataset leads to superior performance compared to the baseline strategies. During testing, the on-policy approaches (VI and SARSA) outperform Q-learning, highlighting the influence of bias-variance tradeoff and the generalization capabilities of simpler policies. However, it is noted that the performance of Q-learning may vary depending on the stability of future market conditions. Future work is suggested, including experiments with updated Q-learning policies during testing and trading diverse individual stocks. Additionally, the exploration of alternative economic indicators for training the models is proposed.
    How adversarial attacks can disrupt seemingly stable accurate classifiers. (arXiv:2309.03665v1 [cs.LG])
    Adversarial attacks dramatically change the output of an otherwise accurate learning system using a seemingly inconsequential modification to a piece of input data. Paradoxically, empirical evidence indicates that even systems which are robust to large random perturbations of the input data remain susceptible to small, easily constructed, adversarial perturbations of their inputs. Here, we show that this may be seen as a fundamental feature of classifiers working with high dimensional input data. We introduce a simple generic and generalisable framework for which key behaviours observed in practical systems arise with high probability -- notably the simultaneous susceptibility of the (otherwise accurate) model to easily constructed adversarial attacks, and robustness to random perturbations of the input data. We confirm that the same phenomena are directly observed in practical neural networks trained on standard image classification problems, where even large additive random noise fails to trigger the adversarial instability of the network. A surprising takeaway is that even small margins separating a classifier's decision surface from training and testing data can hide adversarial susceptibility from being detected using randomly sampled perturbations. Counterintuitively, using additive noise during training or testing is therefore inefficient for eradicating or detecting adversarial examples, and more demanding adversarial training is required.
    LB-SimTSC: An Efficient Similarity-Aware Graph Neural Network for Semi-Supervised Time Series Classification. (arXiv:2301.04838v3 [cs.LG] UPDATED)
    Time series classification is an important data mining task that has received a lot of interest in the past two decades. Due to the label scarcity in practice, semi-supervised time series classification with only a few labeled samples has become popular. Recently, Similarity-aware Time Series Classification (SimTSC) is proposed to address this problem by using a graph neural network classification model on the graph generated from pairwise Dynamic Time Warping (DTW) distance of batch data. It shows excellent accuracy and outperforms state-of-the-art deep learning models in several few-label settings. However, since SimTSC relies on pairwise DTW distances, the quadratic complexity of DTW limits its usability to only reasonably sized datasets. To address this challenge, we propose a new efficient semi-supervised time series classification technique, LB-SimTSC, with a new graph construction module. Instead of using DTW, we propose to utilize a lower bound of DTW, LB_Keogh, to approximate the dissimilarity between instances in linear time, while retaining the relative proximity relationships one would have obtained via computing DTW. We construct the pairwise distance matrix using LB_Keogh and build a graph for the graph neural network. We apply this approach to the ten largest datasets from the well-known UCR time series classification archive. The results demonstrate that this approach can be up to 104x faster than SimTSC when constructing the graph on large datasets without significantly decreasing classification accuracy.  ( 3 min )
    Learning a Patent-Informed Biomedical Knowledge Graph Reveals Technological Potential of Drug Repositioning Candidates. (arXiv:2309.03227v1 [cs.AI])
    Drug repositioning-a promising strategy for discovering new therapeutic uses for existing drugs-has been increasingly explored in the computational science literature using biomedical databases. However, the technological potential of drug repositioning candidates has often been overlooked. This study presents a novel protocol to comprehensively analyse various sources such as pharmaceutical patents and biomedical databases, and identify drug repositioning candidates with both technological potential and scientific evidence. To this end, first, we constructed a scientific biomedical knowledge graph (s-BKG) comprising relationships between drugs, diseases, and genes derived from biomedical databases. Our protocol involves identifying drugs that exhibit limited association with the target disease but are closely located in the s-BKG, as potential drug candidates. We constructed a patent-informed biomedical knowledge graph (p-BKG) by adding pharmaceutical patent information. Finally, we developed a graph embedding protocol to ascertain the structure of the p-BKG, thereby calculating the relevance scores of those candidates with target disease-related patents to evaluate their technological potential. Our case study on Alzheimer's disease demonstrates its efficacy and feasibility, while the quantitative outcomes and systematic methods are expected to bridge the gap between computational discoveries and successful market applications in drug repositioning research.  ( 2 min )
    Insights Into the Inner Workings of Transformer Models for Protein Function Prediction. (arXiv:2309.03631v1 [cs.LG])
    Motivation: We explored how explainable AI (XAI) can help to shed light into the inner workings of neural networks for protein function prediction, by extending the widely used XAI method of integrated gradients such that latent representations inside of transformer models, which were finetuned to Gene Ontology term and Enzyme Commission number prediction, can be inspected too. Results: The approach enabled us to identify amino acids in the sequences that the transformers pay particular attention to, and to show that these relevant sequence parts reflect expectations from biology and chemistry, both in the embedding layer and inside of the model, where we identified transformer heads with a statistically significant correspondence of attribution maps with ground truth sequence annotations (e.g., transmembrane regions, active sites) across many proteins. Availability and Implementation: Source code can be accessed at https://github.com/markuswenzel/xai-proteins .  ( 2 min )
    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples. (arXiv:2309.03847v1 [stat.ML])
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].  ( 2 min )
    MVD:A Novel Methodology and Dataset for Acoustic Vehicle Type Classification. (arXiv:2309.03544v1 [cs.SD])
    Rising urban populations have led to a surge in vehicle use and made traffic monitoring and management indispensable. Acoustic traffic monitoring (ATM) offers a cost-effective and efficient alternative to more computationally expensive methods of monitoring traffic such as those involving computer vision technologies. In this paper, we present MVD and MVDA: two open datasets for the development of acoustic traffic monitoring and vehicle-type classification algorithms, which contain audio recordings of moving vehicles. The dataset contain four classes- Trucks, Cars, Motorbikes, and a No-vehicle class. Additionally, we propose a novel and efficient way to accurately classify these acoustic signals using cepstrum and spectrum based local and global audio features, and a multi-input neural network. Experimental results show that our methodology improves upon the established baselines of previous works and achieves an accuracy of 91.98% and 96.66% on MVD and MVDA Datasets, respectively. Finally, the proposed model was deployed through an Android application to make it accessible for testing and demonstrate its efficacy.  ( 2 min )
    DiffDefense: Defending against Adversarial Attacks via Diffusion Models. (arXiv:2309.03702v1 [cs.LG])
    This paper presents a novel reconstruction method that leverages Diffusion Models to protect machine learning classifiers against adversarial attacks, all without requiring any modifications to the classifiers themselves. The susceptibility of machine learning models to minor input perturbations renders them vulnerable to adversarial attacks. While diffusion-based methods are typically disregarded for adversarial defense due to their slow reverse process, this paper demonstrates that our proposed method offers robustness against adversarial threats while preserving clean accuracy, speed, and plug-and-play compatibility. Code at: https://github.com/HondamunigePrasannaSilva/DiffDefence.  ( 2 min )
    Kernelized Concept Erasure. (arXiv:2201.12191v4 [cs.LG] UPDATED)
    The representation space of neural models for textual data emerges in an unsupervised manner during training. Understanding how those representations encode human-interpretable concepts is a fundamental problem. One prominent approach for the identification of concepts in neural representations is searching for a linear subspace whose erasure prevents the prediction of the concept from the representations. However, while many linear erasure algorithms are tractable and interpretable, neural networks do not necessarily represent concepts in a linear manner. To identify non-linearly encoded concepts, we propose a kernelization of a linear minimax game for concept erasure. We demonstrate that it is possible to prevent specific non-linear adversaries from predicting the concept. However, the protection does not transfer to different nonlinear adversaries. Therefore, exhaustively erasing a non-linearly encoded concept remains an open problem.  ( 2 min )
    Retail store customer behavior analysis system: Design and Implementation. (arXiv:2309.03232v1 [cs.LG])
    Understanding customer behavior in retail stores plays a crucial role in improving customer satisfaction by adding personalized value to services. Behavior analysis reveals both general and detailed patterns in the interaction of customers with a store items and other people, providing store managers with insight into customer preferences. Several solutions aim to utilize this data by recognizing specific behaviors through statistical visualization. However, current approaches are limited to the analysis of small customer behavior sets, utilizing conventional methods to detect behaviors. They do not use deep learning techniques such as deep neural networks, which are powerful methods in the field of computer vision. Furthermore, these methods provide limited figures when visualizing the behavioral data acquired by the system. In this study, we propose a framework that includes three primary parts: mathematical modeling of customer behaviors, behavior analysis using an efficient deep learning based system, and individual and group behavior visualization. Each module and the entire system were validated using data from actual situations in a retail store.  ( 2 min )
    PGFed: Personalize Each Client's Global Objective for Federated Learning. (arXiv:2212.01448v2 [cs.LG] UPDATED)
    Personalized federated learning has received an upsurge of attention due to the mediocre performance of conventional federated learning (FL) over heterogeneous data. Unlike conventional FL which trains a single global consensus model, personalized FL allows different models for different clients. However, existing personalized FL algorithms only implicitly transfer the collaborative knowledge across the federation by embedding the knowledge into the aggregated model or regularization. We observed that this implicit knowledge transfer fails to maximize the potential of each client's empirical risk toward other clients. Based on our observation, in this work, we propose Personalized Global Federated Learning (PGFed), a novel personalized FL framework that enables each client to personalize its own global objective by explicitly and adaptively aggregating the empirical risks of itself and other clients. To avoid massive (O(N^2)) communication overhead and potential privacy leakage while achieving this, each client's risk is estimated through a first-order approximation for other clients' adaptive risk aggregation. On top of PGFed, we develop a momentum upgrade, dubbed PGFedMo, to more efficiently utilize clients' empirical risks. Our extensive experiments on four datasets under different federated settings show consistent improvements of PGFed over previous state-of-the-art methods. The code is publicly available at https://github.com/ljaiverson/pgfed.  ( 2 min )
    Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems. (arXiv:2205.04151v2 [stat.ML] UPDATED)
    Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.  ( 2 min )
    A Function Interpretation Benchmark for Evaluating Interpretability Methods. (arXiv:2309.03886v1 [cs.CL])
    Labeling neural network submodules with human-legible descriptions is useful for many downstream tasks: such descriptions can surface failures, guide interventions, and perhaps even explain important model behaviors. To date, most mechanistic descriptions of trained networks have involved small models, narrowly delimited phenomena, and large amounts of human labor. Labeling all human-interpretable sub-computations in models of increasing size and complexity will almost certainly require tools that can generate and validate descriptions automatically. Recently, techniques that use learned models in-the-loop for labeling have begun to gain traction, but methods for evaluating their efficacy are limited and ad-hoc. How should we validate and compare open-ended labeling tools? This paper introduces FIND (Function INterpretation and Description), a benchmark suite for evaluating the building blocks of automated interpretability methods. FIND contains functions that resemble components of trained neural networks, and accompanying descriptions of the kind we seek to generate. The functions are procedurally constructed across textual and numeric domains, and involve a range of real-world complexities, including noise, composition, approximation, and bias. We evaluate new and existing methods that use language models (LMs) to produce code-based and language descriptions of function behavior. We find that an off-the-shelf LM augmented with only black-box access to functions can sometimes infer their structure, acting as a scientist by forming hypotheses, proposing experiments, and updating descriptions in light of new data. However, LM-based descriptions tend to capture global function behavior and miss local corruptions. These results show that FIND will be useful for characterizing the performance of more sophisticated interpretability methods before they are applied to real-world models.  ( 3 min )
    Early warning via transitions in latent stochastic dynamical systems. (arXiv:2309.03842v1 [stat.ML])
    Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.  ( 2 min )
    XGen-7B Technical Report. (arXiv:2309.03450v1 [cs.CL])
    Large Language Models (LLMs) have become ubiquitous across various domains, transforming the way we interact with information and conduct research. However, most high-performing LLMs remain confined behind proprietary walls, hindering scientific progress. Most open-source LLMs, on the other hand, are limited in their ability to support longer sequence lengths, which is a key requirement for many tasks that require inference over an input context. To address this, we have trained XGen, a series of 7B parameter models on up to 8K sequence length for up to 1.5T tokens. We have also finetuned the XGen models on public-domain instructional data, creating their instruction-tuned counterparts (XGen-Inst). We open-source our models for both research advancements and commercial applications. Our evaluation on standard benchmarks shows that XGen models achieve comparable or better results when compared with state-of-the-art open-source LLMs. Our targeted evaluation on long sequence modeling tasks shows the benefits of our 8K-sequence models over 2K-sequence open-source LLMs.  ( 2 min )
    ArtHDR-Net: Perceptually Realistic and Accurate HDR Content Creation. (arXiv:2309.03827v1 [cs.CV])
    High Dynamic Range (HDR) content creation has become an important topic for modern media and entertainment sectors, gaming and Augmented/Virtual Reality industries. Many methods have been proposed to recreate the HDR counterparts of input Low Dynamic Range (LDR) images/videos given a single exposure or multi-exposure LDRs. The state-of-the-art methods focus primarily on the preservation of the reconstruction's structural similarity and the pixel-wise accuracy. However, these conventional approaches do not emphasize preserving the artistic intent of the images in terms of human visual perception, which is an essential element in media, entertainment and gaming. In this paper, we attempt to study and fill this gap. We propose an architecture called ArtHDR-Net based on a Convolutional Neural Network that uses multi-exposed LDR features as input. Experimental results show that ArtHDR-Net can achieve state-of-the-art performance in terms of the HDR-VDP-2 score (i.e., mean opinion score index) while reaching competitive performance in terms of PSNR and SSIM.  ( 2 min )
    Subgraph-based Tight Frames on Graphs with Compact Supports and Vanishing Moments. (arXiv:2309.03537v1 [eess.SP])
    In this work, we proposed a novel and general method to construct tight frames on graphs with compact supports based on a series of hierarchical partitions. Starting from our abstract construction that generalizes previous methods based on partition trees, we are able to flexibly incorporate subgraph Laplacians into our design of graph frames. Consequently, our general methods permit adjusting the (subgraph) vanishing moments of the framelets and extra properties, such as directionality, for efficiently representing graph signals with path-like supports. Several variants are explicitly defined and tested. Experimental results show our proposed graph frames perform superiorly in non-linear approximation tasks.  ( 2 min )
    Graph Theory Applications in Advanced Geospatial Research. (arXiv:2309.03249v1 [cs.LG])
    Geospatial sciences include a wide range of applications, from environmental monitoring transportation to infrastructure planning, as well as location-based analysis and services. Graph theory algorithms in mathematics have emerged as indispensable tools in these domains due to their capability to model and analyse spatial relationships efficiently. This technical report explores the applications of graph theory algorithms in geospatial sciences, highlighting their role in network analysis, spatial connectivity, geographic information systems, and various other spatial problem-solving scenarios. It provides a comprehensive idea about the key concepts and algorithms of graph theory that assist the modelling processes. The report provides insights into the practical significance of graph theory in addressing real-world geospatial challenges and opportunities. It lists the extensive research, innovative technologies and methodologies implemented in this field.  ( 2 min )
    Cross-Task Attention Network: Improving Multi-Task Learning for Medical Imaging Applications. (arXiv:2309.03837v1 [cs.CV])
    Multi-task learning (MTL) is a powerful approach in deep learning that leverages the information from multiple tasks during training to improve model performance. In medical imaging, MTL has shown great potential to solve various tasks. However, existing MTL architectures in medical imaging are limited in sharing information across tasks, reducing the potential performance improvements of MTL. In this study, we introduce a novel attention-based MTL framework to better leverage inter-task interactions for various tasks from pixel-level to image-level predictions. Specifically, we propose a Cross-Task Attention Network (CTAN) which utilizes cross-task attention mechanisms to incorporate information by interacting across tasks. We validated CTAN on four medical imaging datasets that span different domains and tasks including: radiation treatment planning prediction using planning CT images of two different target cancers (Prostate, OpenKBP); pigmented skin lesion segmentation and diagnosis using dermatoscopic images (HAM10000); and COVID-19 diagnosis and severity prediction using chest CT scans (STOIC). Our study demonstrates the effectiveness of CTAN in improving the accuracy of medical imaging tasks. Compared to standard single-task learning (STL), CTAN demonstrated a 4.67% improvement in performance and outperformed both widely used MTL baselines: hard parameter sharing (HPS) with an average performance improvement of 3.22%; and multi-task attention network (MTAN) with a relative decrease of 5.38%. These findings highlight the significance of our proposed MTL framework in solving medical imaging tasks and its potential to improve their accuracy across domains.  ( 3 min )
    Companion Animal Disease Diagnostics based on Literal-aware Medical Knowledge Graph Representation Learning. (arXiv:2309.03219v1 [cs.AI])
    Knowledge graph (KG) embedding has been used to benefit the diagnosis of animal diseases by analyzing electronic medical records (EMRs), such as notes and veterinary records. However, learning representations to capture entities and relations with literal information in KGs is challenging as the KGs show heterogeneous properties and various types of literal information. Meanwhile, the existing methods mostly aim to preserve graph structures surrounding target nodes without considering different types of literals, which could also carry significant information. In this paper, we propose a knowledge graph embedding model for the efficient diagnosis of animal diseases, which could learn various types of literal information and graph structure and fuse them into unified representations, namely LiteralKG. Specifically, we construct a knowledge graph that is built from EMRs along with literal information collected from various animal hospitals. We then fuse different types of entities and node feature information into unified vector representations through gate networks. Finally, we propose a self-supervised learning task to learn graph structure in pretext tasks and then towards various downstream tasks. Experimental results on link prediction tasks demonstrate that our model outperforms the baselines that consist of state-of-the-art models. The source code is available at https://github.com/NSLab-CUK/LiteralKG.  ( 2 min )
    EvoCLINICAL: Evolving Cyber-Cyber Digital Twin with Active Transfer Learning for Automated Cancer Registry System. (arXiv:2309.03246v1 [cs.LG])
    The Cancer Registry of Norway (CRN) collects information on cancer patients by receiving cancer messages from different medical entities (e.g., medical labs, and hospitals) in Norway. Such messages are validated by an automated cancer registry system: GURI. Its correct operation is crucial since it lays the foundation for cancer research and provides critical cancer-related statistics to its stakeholders. Constructing a cyber-cyber digital twin (CCDT) for GURI can facilitate various experiments and advanced analyses of the operational state of GURI without requiring intensive interactions with the real system. However, GURI constantly evolves due to novel medical diagnostics and treatment, technological advances, etc. Accordingly, CCDT should evolve as well to synchronize with GURI. A key challenge of achieving such synchronization is that evolving CCDT needs abundant data labelled by the new GURI. To tackle this challenge, we propose EvoCLINICAL, which considers the CCDT developed for the previous version of GURI as the pretrained model and fine-tunes it with the dataset labelled by querying a new GURI version. EvoCLINICAL employs a genetic algorithm to select an optimal subset of cancer messages from a candidate dataset and query GURI with it. We evaluate EvoCLINICAL on three evolution processes. The precision, recall, and F1 score are all greater than 91%, demonstrating the effectiveness of EvoCLINICAL. Furthermore, we replace the active learning part of EvoCLINICAL with random selection to study the contribution of transfer learning to the overall performance of EvoCLINICAL. Results show that employing active learning in EvoCLINICAL increases its performances consistently.  ( 3 min )
  • Open

    Deep Metric Learning with Chance Constraints. (arXiv:2209.09060v3 [cs.CV] CROSS LISTED)
    Deep metric learning (DML) aims to minimize empirical expected loss of the pairwise intra-/inter- class proximity violations in the embedding space. We relate DML to feasibility problem of finite chance constraints. We show that minimizer of proxy-based DML satisfies certain chance constraints, and that the worst case generalization performance of the proxy-based methods can be characterized by the radius of the smallest ball around a class proxy to cover the entire domain of the corresponding class samples, suggesting multiple proxies per class helps performance. To provide a scalable algorithm as well as exploiting more proxies, we consider the chance constraints implied by the minimizers of proxy-based DML instances and reformulate DML as finding a feasible point in intersection of such constraints, resulting in a problem to be approximately solved by iterative projections. Simply put, we repeatedly train a regularized proxy-based loss and re-initialize the proxies with the embeddings of the deliberately selected new samples. We applied our method with 4 well-accepted DML losses and show the effectiveness with extensive evaluations on 4 popular DML benchmarks. Code is available at: https://github.com/yetigurbuz/ccp-dml
    Neural lasso: a unifying approach of lasso and neural networks. (arXiv:2309.03770v1 [stat.ML])
    In recent years, there is a growing interest in combining techniques attributed to the areas of Statistics and Machine Learning in order to obtain the benefits of both approaches. In this article, the statistical technique lasso for variable selection is represented through a neural network. It is observed that, although both the statistical approach and its neural version have the same objective function, they differ due to their optimization. In particular, the neural version is usually optimized in one-step using a single validation set, while the statistical counterpart uses a two-step optimization based on cross-validation. The more elaborated optimization of the statistical method results in more accurate parameter estimation, especially when the training set is small. For this reason, a modification of the standard approach for training neural networks, that mimics the statistical framework, is proposed. During the development of the above modification, a new optimization algorithm for identifying the significant variables emerged. Experimental results, using synthetic and real data sets, show that this new optimization algorithm achieves better performance than any of the three previous optimization approaches.
    Enhancing Pipeline-Based Conversational Agents with Large Language Models. (arXiv:2309.03748v1 [cs.CL])
    The latest advancements in AI and deep learning have led to a breakthrough in large language model (LLM)-based agents such as GPT-4. However, many commercial conversational agent development tools are pipeline-based and have limitations in holding a human-like conversation. This paper investigates the capabilities of LLMs to enhance pipeline-based conversational agents during two phases: 1) in the design and development phase and 2) during operations. In 1) LLMs can aid in generating training data, extracting entities and synonyms, localization, and persona design. In 2) LLMs can assist in contextualization, intent classification to prevent conversational breakdown and handle out-of-scope questions, auto-correcting utterances, rephrasing responses, formulating disambiguation questions, summarization, and enabling closed question-answering capabilities. We conducted informal experiments with GPT-4 in the private banking domain to demonstrate the scenarios above with a practical example. Companies may be hesitant to replace their pipeline-based agents with LLMs entirely due to privacy concerns and the need for deep integration within their existing ecosystems. A hybrid approach in which LLMs' are integrated into the pipeline-based agents allows them to save time and costs of building and running agents by capitalizing on the capabilities of LLMs while retaining the integration and privacy safeguards of their existing systems.
    Truncated Diffusion Probabilistic Models and Diffusion-based Adversarial Auto-Encoders. (arXiv:2202.09671v4 [stat.ML] UPDATED)
    Employing a forward diffusion chain to gradually map the data to a noise distribution, diffusion-based generative models learn how to generate the data by inferring a reverse diffusion chain. However, this approach is slow and costly because it needs many forward and reverse steps. We propose a faster and cheaper approach that adds noise not until the data become pure random noise, but until they reach a hidden noisy data distribution that we can confidently learn. Then, we use fewer reverse steps to generate data by starting from this hidden distribution that is made similar to the noisy data. We reveal that the proposed model can be cast as an adversarial auto-encoder empowered by both the diffusion process and a learnable implicit prior. Experimental results show even with a significantly smaller number of reverse diffusion steps, the proposed truncated diffusion probabilistic models can provide consistent improvements over the non-truncated ones in terms of performance in both unconditional and text-guided image generations.
    Explanation Shift: How Did the Distribution Shift Impact the Model?. (arXiv:2303.08081v2 [cs.LG] UPDATED)
    As input data distributions evolve, the predictive performance of machine learning models tends to deteriorate. In practice, new input data tend to come without target labels. Then, state-of-the-art techniques model input data distributions or model prediction distributions and try to understand issues regarding the interactions between learned models and shifting distributions. We suggest a novel approach that models how explanation characteristics shift when affected by distribution shifts. We find that the modeling of explanation shifts can be a better indicator for detecting out-of-distribution model behaviour than state-of-the-art techniques. We analyze different types of distribution shifts using synthetic examples and real-world data sets. We provide an algorithmic method that allows us to inspect the interaction between data set features and learned models and compare them to the state-of-the-art. We release our methods in an open-source Python package, as well as the code used to reproduce our experiments.
    Comparing Sequential Forecasters. (arXiv:2110.00115v5 [stat.ME] UPDATED)
    Consider two forecasters, each making a single prediction for a sequence of events over time. We ask a relatively basic question: how might we compare these forecasters, either online or post-hoc, while avoiding unverifiable assumptions on how the forecasts and outcomes were generated? In this paper, we present a rigorous answer to this question by designing novel sequential inference procedures for estimating the time-varying difference in forecast scores. To do this, we employ confidence sequences (CS), which are sequences of confidence intervals that can be continuously monitored and are valid at arbitrary data-dependent stopping times ("anytime-valid"). The widths of our CSs are adaptive to the underlying variance of the score differences. Underlying their construction is a game-theoretic statistical framework, in which we further identify e-processes and p-processes for sequentially testing a weak null hypothesis -- whether one forecaster outperforms another on average (rather than always). Our methods do not make distributional assumptions on the forecasts or outcomes; our main theorems apply to any bounded scores, and we later provide alternative methods for unbounded scores. We empirically validate our approaches by comparing real-world baseball and weather forecasters.
    Improved theoretical guarantee for rank aggregation via spectral method. (arXiv:2309.03808v1 [stat.ML])
    Given pairwise comparisons between multiple items, how to rank them so that the ranking matches the observations? This problem, known as rank aggregation, has found many applications in sports, recommendation systems, and other web applications. As it is generally NP-hard to find a global ranking that minimizes the mismatch (known as the Kemeny optimization), we focus on the Erd\"os-R\'enyi outliers (ERO) model for this ranking problem. Here, each pairwise comparison is a corrupted copy of the true score difference. We investigate spectral ranking algorithms that are based on unnormalized and normalized data matrices. The key is to understand their performance in recovering the underlying scores of each item from the observed data. This reduces to deriving an entry-wise perturbation error bound between the top eigenvectors of the unnormalized/normalized data matrix and its population counterpart. By using the leave-one-out technique, we provide a sharper $\ell_{\infty}$-norm perturbation bound of the eigenvectors and also derive an error bound on the maximum displacement for each item, with only $\Omega(n\log n)$ samples. Our theoretical analysis improves upon the state-of-the-art results in terms of sample complexity, and our numerical experiments confirm these theoretical findings.
    BoXHED2.0: Scalable boosting of dynamic survival analysis. (arXiv:2103.12591v5 [cs.LG] UPDATED)
    Modern applications of survival analysis increasingly involve time-dependent covariates. The Python package BoXHED2.0 is a tree-boosted hazard estimator that is fully nonparametric, and is applicable to survival settings far more general than right-censoring, including recurring events and competing risks. BoXHED2.0 is also scalable to the point of being on the same order of speed as parametric boosted survival models, in part because its core is written in C++ and it also supports the use of GPUs and multicore CPUs. BoXHED2.0 is available from PyPI and also from www.github.com/BoXHED.
    Ensemble linear interpolators: The role of ensembling. (arXiv:2309.03354v1 [stat.ML])
    Interpolators are unstable. For example, the mininum $\ell_2$ norm least square interpolator exhibits unbounded test errors when dealing with noisy data. In this paper, we study how ensemble stabilizes and thus improves the generalization performance, measured by the out-of-sample prediction risk, of an individual interpolator. We focus on bagged linear interpolators, as bagging is a popular randomization-based ensemble method that can be implemented in parallel. We introduce the multiplier-bootstrap-based bagged least square estimator, which can then be formulated as an average of the sketched least square estimators. The proposed multiplier bootstrap encompasses the classical bootstrap with replacement as a special case, along with a more intriguing variant which we call the Bernoulli bootstrap. Focusing on the proportional regime where the sample size scales proportionally with the feature dimensionality, we investigate the out-of-sample prediction risks of the sketched and bagged least square estimators in both underparametrized and overparameterized regimes. Our results reveal the statistical roles of sketching and bagging. In particular, sketching modifies the aspect ratio and shifts the interpolation threshold of the minimum $\ell_2$ norm estimator. However, the risk of the sketched estimator continues to be unbounded around the interpolation threshold due to excessive variance. In stark contrast, bagging effectively mitigates this variance, leading to a bounded limiting out-of-sample prediction risk. To further understand this stability improvement property, we establish that bagging acts as a form of implicit regularization, substantiated by the equivalence of the bagged estimator with its explicitly regularized counterpart. We also discuss several extensions.
    Proper Learning of Linear Dynamical Systems as a Non-Commutative Polynomial Optimisation Problem. (arXiv:2002.01444v5 [math.OC] UPDATED)
    There has been much recent progress in forecasting the next observation of a linear dynamical system (LDS), which is known as the improper learning, as well as in the estimation of its system matrices, which is known as the proper learning of LDS. We present an approach to proper learning of LDS, which in spite of the non-convexity of the problem, guarantees global convergence of numerical solutions to a least-squares estimator. We present promising computational results.
    Bridging the Gap Between Target Networks and Functional Regularization. (arXiv:2106.02613v4 [stat.ML] UPDATED)
    Bootstrapping is behind much of the successes of deep Reinforcement Learning. However, learning the value function via bootstrapping often leads to unstable training due to fast-changing target values. Target Networks are employed to stabilize training by using an additional set of lagging parameters to estimate the target values. Despite the popularity of Target Networks, their effect on the optimization is still misunderstood. In this work, we show that they act as an implicit regularizer which can be beneficial in some cases, but also have disadvantages such as being inflexible and can result in instabilities, even when vanilla TD(0) converges. To overcome these issues, we propose an explicit Functional Regularization alternative that is flexible and a convex regularizer in function space and we theoretically study its convergence. We conduct an experimental study across a range of environments, discount factors, and off-policiness data collections to investigate the effectiveness of the regularization induced by Target Networks and Functional Regularization in terms of performance, accuracy, and stability. Our findings emphasize that Functional Regularization can be used as a drop-in replacement for Target Networks and result in performance improvement. Furthermore, adjusting both the regularization weight and the network update period in Functional Regularization can result in further performance improvements compared to solely adjusting the network update period as typically done with Target Networks. Our approach also enhances the ability to networks to recover accurate $Q$-values.
    A Probabilistic Semi-Supervised Approach with Triplet Markov Chains. (arXiv:2309.03707v1 [stat.ML])
    Triplet Markov chains are general generative models for sequential data which take into account three kinds of random variables: (noisy) observations, their associated discrete labels and latent variables which aim at strengthening the distribution of the observations and their associated labels. However, in practice, we do not have at our disposal all the labels associated to the observations to estimate the parameters of such models. In this paper, we propose a general framework based on a variational Bayesian inference to train parameterized triplet Markov chain models in a semi-supervised context. The generality of our approach enables us to derive semi-supervised algorithms for a variety of generative models for sequential Bayesian classification.
    Global Optimization for Cardinality-constrained Minimum Sum-of-Squares Clustering via Semidefinite Programming. (arXiv:2209.08901v3 [math.OC] UPDATED)
    The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
    Empirical Risk Minimization for Losses without Variance. (arXiv:2309.03818v1 [stat.ML])
    This paper considers an empirical risk minimization problem under heavy-tailed settings, where data does not have finite variance, but only has $p$-th moment with $p \in (1,2)$. Instead of using estimation procedure based on truncated observed data, we choose the optimizer by minimizing the risk value. Those risk values can be robustly estimated via using the remarkable Catoni's method (Catoni, 2012). Thanks to the structure of Catoni-type influence functions, we are able to establish excess risk upper bounds via using generalized generic chaining methods. Moreover, we take computational issues into consideration. We especially theoretically investigate two types of optimization methods, robust gradient descent algorithm and empirical risk-based methods. With an extensive numerical study, we find that the optimizer based on empirical risks via Catoni-style estimation indeed shows better performance than other baselines. It indicates that estimation directly based on truncated data may lead to unsatisfactory results.
    Early warning via transitions in latent stochastic dynamical systems. (arXiv:2309.03842v1 [stat.ML])
    Early warnings for dynamical transitions in complex systems or high-dimensional observation data are essential in many real world applications, such as gene mutation, brain diseases, natural disasters, financial crises, and engineering reliability. To effectively extract early warning signals, we develop a novel approach: the directed anisotropic diffusion map that captures the latent evolutionary dynamics in low-dimensional manifold. Applying the methodology to authentic electroencephalogram (EEG) data, we successfully find the appropriate effective coordinates, and derive early warning signals capable of detecting the tipping point during the state transition. Our method bridges the latent dynamics with the original dataset. The framework is validated to be accurate and effective through numerical experiments, in terms of density and transition probability. It is shown that the second coordinate holds meaningful information for critical transition in various evaluation metrics.
    Auto-SDE: Learning effective reduced dynamics from data-driven stochastic dynamical systems. (arXiv:2205.04151v2 [stat.ML] UPDATED)
    Multiscale stochastic dynamical systems have been widely adopted to scientific and engineering problems due to their capability of depicting complex phenomena in many real world applications. This work is devoted to investigating the effective reduced dynamics for a slow-fast stochastic dynamical system. Given observation data on a short-term period satisfying some unknown slow-fast stochastic system, we propose a novel algorithm including a neural network called Auto-SDE to learn invariant slow manifold. Our approach captures the evolutionary nature of a series of time-dependent autoencoder neural networks with the loss constructed from a discretized stochastic differential equation. Our algorithm is also proved to be accurate, stable and effective through numerical experiments under various evaluation metrics.
    Deep Network Approximation: Beyond ReLU to Diverse Activation Functions. (arXiv:2307.06555v3 [cs.LG] UPDATED)
    This paper explores the expressive power of deep neural networks for a diverse range of activation functions. An activation function set $\mathscr{A}$ is defined to encompass the majority of commonly used activation functions, such as $\mathtt{ReLU}$, $\mathtt{LeakyReLU}$, $\mathtt{ReLU}^2$, $\mathtt{ELU}$, $\mathtt{SELU}$, $\mathtt{Softplus}$, $\mathtt{GELU}$, $\mathtt{SiLU}$, $\mathtt{Swish}$, $\mathtt{Mish}$, $\mathtt{Sigmoid}$, $\mathtt{Tanh}$, $\mathtt{Arctan}$, $\mathtt{Softsign}$, $\mathtt{dSiLU}$, and $\mathtt{SRS}$. We demonstrate that for any activation function $\varrho\in \mathscr{A}$, a $\mathtt{ReLU}$ network of width $N$ and depth $L$ can be approximated to arbitrary precision by a $\varrho$-activated network of width $4N$ and depth $2L$ on any bounded set. This finding enables the extension of most approximation results achieved with $\mathtt{ReLU}$ networks to a wide variety of other activation functions, at the cost of slightly larger constants.
    Mixtures of Gaussians are Privately Learnable with a Polynomial Number of Samples. (arXiv:2309.03847v1 [stat.ML])
    We study the problem of estimating mixtures of Gaussians under the constraint of differential privacy (DP). Our main result is that $\tilde{O}(k^2 d^4 \log(1/\delta) / \alpha^2 \varepsilon)$ samples are sufficient to estimate a mixture of $k$ Gaussians up to total variation distance $\alpha$ while satisfying $(\varepsilon, \delta)$-DP. This is the first finite sample complexity upper bound for the problem that does not make any structural assumptions on the GMMs. To solve the problem, we devise a new framework which may be useful for other tasks. On a high level, we show that if a class of distributions (such as Gaussians) is (1) list decodable and (2) admits a "locally small'' cover [BKSW19] with respect to total variation distance, then the class of its mixtures is privately learnable. The proof circumvents a known barrier indicating that, unlike Gaussians, GMMs do not admit a locally small cover [AAL21].
    A Tutorial on the Non-Asymptotic Theory of System Identification. (arXiv:2309.03873v1 [eess.SY])
    This tutorial serves as an introduction to recently developed non-asymptotic methods in the theory of -- mainly linear -- system identification. We emphasize tools we deem particularly useful for a range of problems in this domain, such as the covering technique, the Hanson-Wright Inequality and the method of self-normalized martingales. We then employ these tools to give streamlined proofs of the performance of various least-squares based estimators for identifying the parameters in autoregressive models. We conclude by sketching out how the ideas presented herein can be extended to certain nonlinear identification problems.
    Gradient-Based Feature Learning under Structured Data. (arXiv:2309.03843v1 [stat.ML])
    Recent works have demonstrated that the sample complexity of gradient-based learning of single index models, i.e. functions that depend on a 1-dimensional projection of the input data, is governed by their information exponent. However, these results are only concerned with isotropic data, while in practice the input often contains additional structure which can implicitly guide the algorithm. In this work, we investigate the effect of a spiked covariance structure and reveal several interesting phenomena. First, we show that in the anisotropic setting, the commonly used spherical gradient dynamics may fail to recover the true direction, even when the spike is perfectly aligned with the target direction. Next, we show that appropriate weight normalization that is reminiscent of batch normalization can alleviate this issue. Further, by exploiting the alignment between the (spiked) input covariance and the target, we obtain improved sample complexity compared to the isotropic case. In particular, under the spiked model with a suitably large spike, the sample complexity of gradient-based training can be made independent of the information exponent while also outperforming lower bounds for rotationally invariant kernel methods.
    Knowledge Distillation Layer that Lets the Student Decide. (arXiv:2309.02843v1 [cs.CV] CROSS LISTED)
    Typical technique in knowledge distillation (KD) is regularizing the learning of a limited capacity model (student) by pushing its responses to match a powerful model's (teacher). Albeit useful especially in the penultimate layer and beyond, its action on student's feature transform is rather implicit, limiting its practice in the intermediate layers. To explicitly embed the teacher's knowledge in feature transform, we propose a learnable KD layer for the student which improves KD with two distinct abilities: i) learning how to leverage the teacher's knowledge, enabling to discard nuisance information, and ii) feeding forward the transferred knowledge deeper. Thus, the student enjoys the teacher's knowledge during the inference besides training. Formally, we repurpose 1x1-BN-ReLU-1x1 convolution block to assign a semantic vector to each local region according to the template (supervised by the teacher) that the corresponding region of the student matches. To facilitate template learning in the intermediate layers, we propose a novel form of supervision based on the teacher's decisions. Through rigorous experimentation, we demonstrate the effectiveness of our approach on 3 popular classification benchmarks. Code is available at: https://github.com/adagorgun/letKD-framework
    Trinary Decision Trees for missing value handling. (arXiv:2309.03561v1 [stat.ML])
    This paper introduces the Trinary decision tree, an algorithm designed to improve the handling of missing data in decision tree regressors and classifiers. Unlike other approaches, the Trinary decision tree does not assume that missing values contain any information about the response. Both theoretical calculations on estimator bias and numerical illustrations using real data sets are presented to compare its performance with established algorithms in different missing data scenarios (Missing Completely at Random (MCAR), and Informative Missingness (IM)). Notably, the Trinary tree outperforms its peers in MCAR settings, especially when data is only missing out-of-sample, while lacking behind in IM settings. A hybrid model, the TrinaryMIA tree, which combines the Trinary tree and the Missing In Attributes (MIA) approach, shows robust performance in all types of missingness. Despite the potential drawback of slower training speed, the Trinary tree offers a promising and more accurate method of handling missing data in decision tree algorithms.
    Copula Representations and Error Surface Projections for the Exclusive Or Problem. (arXiv:1907.04483v2 [cs.LG] UPDATED)
    The exclusive or (xor) function is one of the simplest examples that illustrate why nonlinear feedforward networks are superior to linear regression for machine learning applications. We review the xor representation and approximation problems and discuss their solutions in terms of probabilistic logic and associative copula functions. After briefly reviewing the specification of feedforward networks, we compare the dynamics of learned error surfaces with different activation functions such as RELU and tanh through a set of colorful three-dimensional charts. The copula representations extend xor from Boolean to real values, thereby providing a convenient way to demonstrate the concept of cross-validation on in-sample and out-sample data sets. Our approach is pedagogical and is meant to be a machine learning prolegomenon.
    Causal thinking for decision making on Electronic Health Records: why and how. (arXiv:2308.01605v3 [stat.ME] UPDATED)
    Accurate predictions, as with machine learning, may not suffice to provide optimal healthcare for every patient. Indeed, prediction can be driven by shortcuts in the data, such as racial biases. Causal thinking is needed for data-driven decisions. Here, we give an introduction to the key elements, focusing on routinely-collected data, electronic health records (EHRs) and claims data. Using such data to assess the value of an intervention requires care: temporal dependencies and existing practices easily confound the causal effect. We present a step-by-step framework to help build valid decision making from real-life patient records by emulating a randomized trial before individualizing decisions, eg with machine learning. Our framework highlights the most important pitfalls and considerations in analysing EHRs or claims data to draw causal conclusions. We illustrate the various choices in studying the effect of albumin on sepsis mortality in the Medical Information Mart for Intensive Care database (MIMIC-IV). We study the impact of various choices at every step, from feature extraction to causal-estimator selection. In a tutorial spirit, the code and the data are openly available.
    Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck. (arXiv:2309.03800v1 [cs.LG])
    This work investigates the nuanced algorithm design choices for deep learning in the presence of computational-statistical gaps. We begin by considering offline sparse parity learning, a supervised classification problem which admits a statistical query lower bound for gradient-based training of a multilayer perceptron. This lower bound can be interpreted as a multi-resource tradeoff frontier: successful learning can only occur if one is sufficiently rich (large model), knowledgeable (large dataset), patient (many training iterations), or lucky (many random guesses). We show, theoretically and experimentally, that sparse initialization and increasing network width yield significant improvements in sample efficiency in this setting. Here, width plays the role of parallel search: it amplifies the probability of finding "lottery ticket" neurons, which learn sparse features more sample-efficiently. Finally, we show that the synthetic sparse parity task can be useful as a proxy for real problems requiring axis-aligned feature learning. We demonstrate improved sample efficiency on tabular classification benchmarks by using wide, sparsely-initialized MLP models; these networks sometimes outperform tuned random forests.
    Medoid Silhouette clustering with automatic cluster number selection. (arXiv:2309.03751v1 [cs.LG])
    The evaluation of clustering results is difficult, highly dependent on the evaluated data set and the perspective of the beholder. There are many different clustering quality measures, which try to provide a general measure to validate clustering results. A very popular measure is the Silhouette. We discuss the efficient medoid-based variant of the Silhouette, perform a theoretical analysis of its properties, provide two fast versions for the direct optimization, and discuss the use to choose the optimal number of clusters. We combine ideas from the original Silhouette with the well-known PAM algorithm and its latest improvements FasterPAM. One of the versions guarantees equal results to the original variant and provides a run speedup of $O(k^2)$. In experiments on real data with 30000 samples and $k$=100, we observed a 10464$\times$ speedup compared to the original PAMMEDSIL algorithm. Additionally, we provide a variant to choose the optimal number of clusters directly.
    Adversarially Robust Deep Learning with Optimal-Transport-Regularized Divergences. (arXiv:2309.03791v1 [cs.LG])
    We introduce the $ARMOR_D$ methods as novel approaches to enhancing the adversarial robustness of deep learning models. These methods are based on a new class of optimal-transport-regularized divergences, constructed via an infimal convolution between an information divergence and an optimal-transport (OT) cost. We use these as tools to enhance adversarial robustness by maximizing the expected loss over a neighborhood of distributions, a technique known as distributionally robust optimization. Viewed as a tool for constructing adversarial samples, our method allows samples to be both transported, according to the OT cost, and re-weighted, according to the information divergence. We demonstrate the effectiveness of our method on malware detection and image recognition applications and find that, to our knowledge, it outperforms existing methods at enhancing the robustness against adversarial attacks. $ARMOR_D$ yields the robustified accuracy of $98.29\%$ against $FGSM$ and $98.18\%$ against $PGD^{40}$ on the MNIST dataset, reducing the error rate by more than $19.7\%$ and $37.2\%$ respectively compared to prior methods. Similarly, in malware detection, a discrete (binary) data domain, $ARMOR_D$ improves the robustified accuracy under $rFGSM^{50}$ attack compared to the previous best-performing adversarial training methods by $37.0\%$ while lowering false negative and false positive rates by $51.1\%$ and $57.53\%$, respectively.
    On the dynamics of multi agent nonlinear filtering and learning. (arXiv:2309.03557v1 [stat.ML])
    Multiagent systems aim to accomplish highly complex learning tasks through decentralised consensus seeking dynamics and their use has garnered a great deal of attention in the signal processing and computational intelligence societies. This article examines the behaviour of multiagent networked systems with nonlinear filtering/learning dynamics. To this end, a general formulation for the actions of an agent in multiagent networked systems is presented and conditions for achieving a cohesive learning behaviour is given. Importantly, application of the so derived framework in distributed and federated learning scenarios are presented.  ( 2 min )

  • Open

    Differentially private median and more
    Posted by Edith Cohen and Uri Stemmer, Research Scientists, Google Research Differential privacy (DP) is a rigorous mathematical definition of privacy. DP algorithms are randomized to protect user data by ensuring that the probability of any particular output is nearly unchanged when a data point is added or removed. Therefore, the output of a DP algorithm does not disclose the presence of any one data point. There has been significant progress in both foundational research and adoption of differential privacy with contributions such as the Privacy Sandbox and Google Open Source Library. ML and data analytics algorithms can often be described as performing multiple basic computation steps on the same dataset. When each such step is differentially private, so is the output, but with …  ( 94 min )
  • Open

    Instacart boosts AI capacity, readies for IPO with OpenAI's ChatGPT-powered eCommerce search
    On the verge of its IPO, Instacart has introduced major AI-powered features to its Storefront platform and the smart Caper Carts. Main upgrades: conversational search powered by OpenAI's ChatGPT and inbuilt AI models. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/olqtxvwjo3nb1.png?width=750&format=png&auto=webp&s=d8eaefbb9865c51732efc2792ec386610ecd38e6 AI advancements in Instacart's infrastructure Instacart, which holds approximately 22% of the $132 billion US online grocery-delivery market, has been leaning more towards being a tech platform. The new Instacart Storefront, entailing features driven by 150 proprietary AI models, is built on the same core infrastructure as the Instacart app. Customers can engage in open-ended searches on retailers' storefronts via the search bar. AI upgrades in Caper Carts AI-powered Caper Carts by Instacart have been upgraded. Customers can now order directly from their Caper Cart and get informed when their orders are ready. Camera and weight sensor efficiency is enhanced thanks to improved AI models, ensuring a smoother shopping journey and providing an extra layer of security against suspicious activity. (source) P.S. If you want this kind of analysis, delve into the latest updates in AI with our free newsletter, already favored by professionals from Google, Meta, and OpenAI. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    AI girlfriend ads are flooding Instagram and TikTok
    Tech startups are running sexually explicit ads for apps promoting not-safe-for-work experiences on platforms like Facebook, Instagram, and TikTok. These ads feature digitally created potential 'girlfriends' with large breasts and tight clothing, and some even use popular children's TV characters to promote 'NSFW pics' apps. NBC News found 35 app developers running sexually explicit ads on Meta-owned apps, and 14 app developers running similar ads on TikTok. The marketing push is part of an AI gold rush, capitalizing on the surge of interest in AI and benefiting from a double standard that hurts real human sex workers. Researchers believe that the gender-based slant in these ads reflects social media platforms allowing sex-related ads only if the intended audience is men. Meta and TikTok have stepped up their removal of sexually explicit AI ads after NBC News contacted them, but questions remain about how the ads got through their filters in the first place. Similar ads also appear in the Apple and Google app stores, although the extent of advertising there is unknown. Source : https://www.nbcnews.com/tech/social-media/ai-girlfriend-ads-instagram-tiktok-chat-pics-chatgpt-dose-rcna97547 submitted by /u/NuseAI [link] [comments]  ( 9 min )
    AI — weekly megathread!
    News provided by aibrews.com Technology Innovation Institute in Abu Dhabi has released Falcon 180B - a large language model with 180 billion parameters, trained on 3.5 trillion tokens. It's currently the largest openly available model, and rivals proprietary models like PaLM-2. Falcon 180B is 2.5 times larger than Llama 2 and was trained with 4x more compute. It is available for both research and commercial use [Details]. Meta AI released Belebele, a first-of-its-kind multilingual reading comprehension dataset spanning 122 language variants, enabling direct comparison of how well models understand different languages [Details]. Meta AI has published Code Llama’s research paper with more information on training, evaluation results and safety [Paper]. Open Interpreter, an open-source, …  ( 10 min )
    Animating a 2D image in real time
    Hello Everyone, i have recently started working on a project, where I need to animate an image of a face in real time to speak sentences. Essentially I am trying to build a face for my own large language model. I know of Nvidia's Audio2Face and Metahuman, but these are all in 3D and take a lot of time rendering the lip and eye animations. I need something, which works only with a bit of latency. ​ Does anyone know a service or a repo I could use to animate a 2D picture to speak text? submitted by /u/Fabianslife [link] [comments]  ( 9 min )
    Free AI transforms text and images into amazing videos - Pika Labs
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    Would ChatGPT work to help with looking for WFH jobs?/changing careers?
    This is a complete ChatGPT beginner question but has anyone ever downloaded it and used it to help with looking for specific job roles? Mainly WFH related? Or thought about changing careers and used ChatGPT to help with that? I know there are a lot of other ways to go about this but would ChatGPT help with this at all? submitted by /u/jackbowls [link] [comments]  ( 9 min )
    Do you feel endangered by the rise of AI?
    View Poll submitted by /u/MiladMansory [link] [comments]  ( 9 min )
    AI grading and AI screening but no AI for homework/assignments/exam?
    Professors send emails explaining that they use AI but they reviewed the grades from AI to make sure everything is fine. But students can’t use AI and then review the results just make sure everything is fine. submitted by /u/PrettyHappyAndGay [link] [comments]  ( 9 min )
    One-Minute Daily AI News 8/7/2023
    A new AI tool developed by startup Delphi allows users to create virtual clones of themselves or anyone else. Users can upload an ID and add various files, such as emails, chat transcripts, and videos, to generate an AI chatbot that mimics their personality.[1] OpenAI will host its first developer conference on November 6.[2] Meta Platforms Inc. today released FACET, a benchmark dataset designed to help researchers audit computer vision models for bias.[3] Australia to require AI-made child abuse material be removed from search results.[4] Sources: [1] https://technotrenz.com/news/a-new-ai-service-allows-for-the-creation-of-a-virtual-version-of-yourself-or-a-loved-one-that-is-capable-of-making-phone-calls-on-your-behalf-2772634.html [2] https://techcrunch.com/2023/09/06/openai-will-host-its-first-developer-conference-on-november-6/ [3] https://siliconangle.com/2023/08/31/meta-releases-facet-dataset-evaluating-ai-fairness/ [4] https://www.reuters.com/technology/australia-require-ai-made-child-abuse-material-be-removed-search-results-2023-09-08/ submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    AGI will be not feasible any time soon, here's why
    I was thinking today about all the AI hype we have right with somewhat a bunch of new breakthroughs each month, but things not only are getting slower updates, but the updates impacts itself are becoming lesser. If that is not enough, well we have big problems ahead, such as processors are reaching the physical limit, quantum effects disrupting the works, wafers becoming increasing more expensive, the size reduction is no longer adding the same boosts in power and new materials are just far from viable. On top of this we are going meet two other walls, the software and the energy. About the first, as we make better and more complex algorithms for computation the harder it gets to make better ones to squeeze more power and handle more complex tasks. The second, is becoming more real as bi…  ( 12 min )
  • Open

    [P] MLOps for Vercel OpenAI chatbot infrastructure
    I used infrastructure as code (IaC) to provision and deploy Vercel's next-openai example. IaC is useful because it applies the same rigor of application code development to infrastructure provisioning. Instead of manual point and click in a cloud console which can be unrepeatable or error-prone, you just store and change all infrastructure configurations as code in source control . This example uses Pulumi which allows you to write the IaC in Python. https://github.com/aaronkao/vercel-py-openai-chatbot submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    Why Do You Not Use Open Source LLMs? (Or do you?) [D] (Repost because I made a mistake in the title)
    Reposting because I intended to ask about LLMs, not AI in general, and forgot that I don't need to dumb down the terminology for this sub. Thanks to the people who pointed out that mistake. --- original post --- This is something I'm curious about. I've seen a few people declaring that they're not using open source LLMs because they're GPU-poor, because the models aren't good enough, because the uis/frontends are hard to get started with, etc., and I've been wondering how much these comments and posts reflect the opinions and needs of the community as a whole. So, here's a poll. Answer away if you feel like it. I'm sharing this on a few other subs too (for the sake of greater information gathering) so please don't vote more than once. If your reasoning is not on here, feel free to comment your thoughts. If more than one option describes you, please select the one that describes you the most. View Poll submitted by /u/Heralax_Tekran [link] [comments]  ( 9 min )
    [D] What are good resources for creating NLP algorithms from scratch?
    I'm looking to learn more about concurrency/parallelism, optimization, data structures and algorithms from an NLP perspective. submitted by /u/Al_Miksiki [link] [comments]  ( 9 min )
    [D] Please Help - Machine Learning (ML) Engineers
    Hello Everyone, I'm currently exploring the idea of a solution tailored for ML engineers and technologists. While I have a background in recruiting, I've often found myself dissatisfied with the typical recruitment process. It seems that many recruiters don't always appreciate the importance of working with candidates or understand the impact on people's livelihoods and careers. What I'm proposing is the creation of a career representation firm specifically designed for purpose-driven technologists specializing in data, product, and hardware careers. This firm would advocate for the career interests of the most passionate ML engineers. Our representation would encompass: - Strategic Career Development: Crafting a strategic approach to help engineers secure opportunities aligned with their desired projects and professional development. - Impact Matching: Identifying and connecting engineers with projects and teams where their technical skills, career goals, and personal interests can have the greatest positive impact, ensuring that your work aligns with your values and aspirations. - Industry Leadership: Positioning you as an industry leader by marketing your expertise and securing speaking engagements at conferences and other events, enhancing your professional visibility and reputation. In return for this representation, engineers would commit to a 3% fee deducted from their salary, which would support the services provided by the firm. Would you be interested in participating in such a service? If not, would you consider recommending it to someone you know? If you are in favor of this idea, what makes you believe it would be advantageous for others even if it might not be your preference? Do you think you could personally benefit from this type of career representation? Thanks! submitted by /u/Educational_Bar_6352 [link] [comments]  ( 10 min )
    [R] Algorithm of Thoughts Prompt Engineering Breakdown
    Paper: https://arxiv.org/abs/2308.10379 Saw someone else post about this new prompting method on the sub here so I decided to put together a run down and prompt template. Pretty interesting to see the different methods emerge and how some attempt to simulate how code runs. My rundown -> https://www.prompthub.us/blog/how-algorithm-of-thoughts-prompting-works submitted by /u/dancleary544 [link] [comments]  ( 9 min )
    [R][D] How to implement Sinusoidal Positional Embedding?
    Hi fellow computer scientists, so I've been researching a little about transformers and meanwhile I had to understand sinusoidal positional embedding. I have found two implementations for this, after testing both approaches I found they compute different embeddings for the same position/timestep with the same embedding dimensions... shouldn't it be equal if the position and embedding dimensions are the same? This is getting me confused, because now I don't know which implementation should I consider... Do you have any suggestions to where I can look? Thank you :) submitted by /u/Christs_Elite [link] [comments]  ( 9 min )
    "[Discussion]"
    Hi guys, I'm completely new in this field.. I have a research in civil engineering and need to learn python, machine learning and data analysis as short as possible. Where can I achieve that?? please help me by naming the best courses or any free materials available🙏 submitted by /u/Ok-Upstairs7749 [link] [comments]  ( 9 min )
    [P] Question answering based on book-summaries
    I'm one of those people who always ask questions about movies because there's something they don't get or have forgotten. Especially with more complex stories, like Game of Thrones. At the moment I'm reading Wheel of Time, a rather long fantasy series. I had the idea to build the following WebApp: There is online each chapter of the series summarized separately. So in the WebApp I could ask questions about the content. In addition, I can indicate which chapter I am reading, so that it is ensured not to spoil the user. I want to avoid to train a model. I would prefer to use one of the existing open-source models, like llama. A first, primitive idea: give the LLM all the summaries and the user's question. But this would mean to give all summaries as input every time. Not only that this approach would not be elegant, the restriction in the input size (number of words) would make this possibly even impossible. Feel free to share your ideas how i could solve this :) submitted by /u/Individual-Cause-616 [link] [comments]  ( 9 min )
    Help me with creating dataset from .mat files [D]
    I have so many .mat files in a folder which have two arrays inside each .mat file. that is, for each .mat file, i have a (224*224) array and another (136,1) array. These 224*224 arrays are my X_trains for a model and these corresponding 136*1 arrays are my y_trains (labels). i can read these files as np arrays using scipy's loadmat. My problem is, is there a way to usen tf.data .Dataset object to send these to a model or there is any other way? Also using this tf.data.Dataset can i split into train, test, val data? submitted by /u/likhith-69 [link] [comments]  ( 9 min )
    [P] CLI tool to benchmark 100+LLMs response, response time, cost
    Hi r/MachineLearning, I built a CLI tool to benchmark 100+ LLMs for a given question. Benchmark output allows you to compare responses, response time and cost. Try it here: https://github.com/BerriAI/litellm/blob/main/cookbook/benchmark/readme.md CLI Output: Output from CLI Tool Simply select your LLMs, enter your API keys, LLM configs and run python3 benchmark.py Happy completion()! submitted by /u/Comfortable_Dirt5590 [link] [comments]  ( 9 min )
    Text summarization [P]
    Hey! If anyone has worked with text summarization before especially with TF-IDF and extractive summarization,kindly please dm me. Hope you have a great day! submitted by /u/Ok-Avocado-5370 [link] [comments]  ( 9 min )
    [P] A look at Apple’s new Transformer-powered predictive text model
    In the upcoming versions of macOS and iOS, Apple is including a predictive text model which offers suggestions while you type, which they’ve said to be a "transformer model". I managed to find some details about this model, including details about its topology and tokenizer, and I was even able to peek in and see several of its top predictions while typing! Blogpost: https://jackcook.com/2023/09/08/predictive-text.html Source code: https://github.com/jackcook/predictive-spy Hopefully this can give some insight into some of the trade-offs that Apple went through to put a model on every iPhone and MacBook — it’s small, it has a pretty narrow scope, and it’s not very capable on its own. Let me know what you think! submitted by /u/jackcook [link] [comments]  ( 9 min )
    [P] AI Beats Hockolicious, Trackmania's Most Prestigious Map
    Follow-up on our previous post (Vision-based reinforcement learning for Trackmania: close or at superhuman level). Many comments rightfully pointed that the map we trained on: - lacked difficult features like jumps, airbrakes, drifts, ... - had not widely been played by humans We have now trained the same AI on the game's most prestigious map: Hockolicious. We also prepared a video describing the approach with much more detail. Here is our result :) AI Beats Hockolicious, Trackmania's Most Prestigious Map Note: We are still using a convolutional neural network with a structure similar to Nature's DQN paper. I am curious whether other architectures (the ResNet-like in the IMPALA paper ?) could help. Do you have any suggestions on how the neural network's vision head should be structured for that specific task? submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    [D] Methodology for counting/segmenting objects in close formations
    Hello all. I'm new to object recognition and instance segmentation. I am trying to work on a project in which I use drone imagery to detect objects that are in close formations with each other. I do this for the purpose of counting particular objects, as well as to check if an object has moved (by making a prediction on drone imagery that is taken later). Create masks? I'm now trying to understand what methodology/models make sense. First of all, should I be looking at creating masks, or do bounding boxes suffice? My idea was that masks are better, since bounding boxes overlap with each other and can miss that an object has moved slightly, Or am I wrong and are masks just an extra hassle? Or shouldn't I be looking at bounding boxes or masks at all? MaskRCNN? Model-wise, should I be lo…  ( 10 min )
    [D] Chains and Agents
    I think there's a lot of confusion around AI agents today and it's mainly because of lack of definition and using the wrong terminology. We've been talking to many companies who are claiming they're working on agents but when you look under the hood, they are really just chains. I just listened to the Latent Space pod with Harrison Chase (Founder of Langchain) and I really liked how he thinks about chains vs agents. Chains: sequence of tasks in a more rigid order, where you have more control, more predictability. Agents: handling the edge-cases, the long-tail of things that can happen. And the most important thing is that it's not an OR question but an AND one: you can use them in the same application by starting with chains -> figuring our the edge-cases -> using agents to deal with them. https://preview.redd.it/l59sc4sri0nb1.png?width=3127&format=png&auto=webp&s=1f3f8730c48687eaabf1f554deb181cf35b96036 submitted by /u/BootstrapGuy [link] [comments]  ( 9 min )
    [D] Question for Jensen Huang
    I have the opportunity to see Jensen speak in the next month at a semi private event, 250-300 people. I will probably have the opportunity to ask him a question. What would you ask him? submitted by /u/Zealousideal-Food285 [link] [comments]  ( 9 min )
    [D] Object detection in 3D
    Greetings, people. My colleague told me about some methods of object detection/classification on 3D models, and now I'm exploring them. But during my research I couldn't find that much information about them. I would like to ask you to provide me information, literature and examples of application for them. I remember that one of the techniques is called voxelization. But still not able to find great and intuitive example. Would be thankful for any information :) submitted by /u/thattallsoldier [link] [comments]  ( 9 min )
    [D] What object detection and segmentation model repos do you folks use for production
    Looking at all the popular yolo repos, v5, v7,v8, yolo-nas, all of them seem to have restrictive licenses (gpl3, agpl, apache 2) where the trained model files also falls under the license. What do people usually use to deploy detection/segmentation in production, especially with resource constraints (can't use something like fast-rcnn) submitted by /u/Appropriate_Bear_894 [link] [comments]  ( 9 min )
    [D] Proper use of ai-voice-cloning / rvc / tortoise
    Hey guys! I need some help here.. many days trying to get good results but without success. So I already have the voice I want to use (edited with uvr5 and it sounds really great, without any echo or noise on the background), I trained it on aivc so that I can generate this voice verbalizing the text content I need. I used high quality - it took like 40min to generate each phrase - and it is ok, but still a little robotic. So I installed RVC and trained a model with the original voice (edited with the uvr5) just like I did the training on aivc. So I loaded the trained model on the inference tab and I selected the audio to be processed - the generated audio files from aivc. Even selecting the harvest mode, the output was worse than the generated files from aivc. I even tried to record my own voice speaking the text but it does not sound good. My trained model on rvc has 500 epochs, and it may be a very good model to use, yet idk what I’m doing wrong. Maybe I’m misusing rvc, so what I need is to improve the realism of my aivc(or tortoise) generated voices, simple as that, is rvc the best option to do this? If yes, how? Any help please would be much appreciated thanks! submitted by /u/JustSayin_thatuknow [link] [comments]  ( 9 min )
    [R] FLM-101B: An Open LLM and How to Train It with $100K Budget
    submitted by /u/hzj5790 [link] [comments]  ( 9 min )
    [R] Seeking Research Papers on Weight Manipulation in Physics-Informed Neural Networks (PINNs)
    Would you kindly share noteworthy papers that have caught your interest concerning the weights of physics-informed neural networks (PINNs)? I am looking for any innovative paper that has something to do with weights of the physics-informed neural networks or deep neural networks in general and its manipulation. Specifically, I am seeking innovative papers on weight manipulation in physics-informed neural networks. For instance papers like: Weight initialization algorithm for physics-informed neural networks using finite differences Transfer Learning with Physics-Informed Neural Networks for Efficient Simulation of Branched Flows Note that I am referring to the actual weights of the neural network and not the weights of the loss terms. I have to add that ideas from transfer learning are welcome too. submitted by /u/ai_physics2023 [link] [comments]  ( 9 min )
    [P][R] Finetune LLMs via the Finetuning Hub
    Hi ML community, I have been working on benchmarking publicly available LLMs these past couple of weeks. More precisely, I am interested on the finetuning piece since a lot of businesses are starting to entertain the idea of self-hosting LLMs trained on their proprietary data rather than relying on third party APIs. To this point, I am tracking the following 4 pillars of evaluation that businesses are typically look into: - Performance - Time to train an LLM - Cost to train an LLM - Inference (throughput / latency / cost per token) For each LLM, my aim is to benchmark them for popular tasks, i.e., classification and summarization. Moreover, I would like to compare them against each other. So far, I have benchmarked Flan-T5-Large, Falcon-7B and RedPajama and have found them to be very efficient in low-data situations, i.e., when there are very few annotated samples. Llama2-7B/13B and Writer’s Palmyra are in the pipeline. But there’s so many LLMs out there! In case this work interests you, would be great to join forces. GitHub repo attached — feedback is always welcome :) https://github.com/georgian-io/LLM-Finetuning-Hub Happy hacking! submitted by /u/l-llm [link] [comments]  ( 9 min )
  • Open

    AI pilot programs look to reduce energy use and emissions on MIT campus
    A cross-departmental team is leading efforts to utilize machine learning for increased efficiency in heating and cooling MIT’s buildings.  ( 10 min )
    Jackson Jewett wants to design buildings that use less concrete
    The PhD student is honing algorithms for designing large structures with less material — helping to shrink the construction industry’s huge carbon footprint.  ( 10 min )
  • Open

    Can't solve Gymnasium Frozenlake-v1 8x8 with A2C
    Hello, I'm trying to solve the Frozenlake-v1 environment with is_slippery = True (non-deterministic) with the stable baselines 3 A2C algorithm. I can solve the 4x4 version but I can't achieve any results with the 8x8 version. I also checked the RL-Zoo to see if there is any hyperparameter tunning about that environment but there is nothing. Which adjustments can I do to make it work properly? submitted by /u/MetallicaSPA [link] [comments]  ( 9 min )
    RL in games
    Hello guys, I was suddenly inspired to make a WH Gladius bot. Background: I recently got into the game, it seems very interesting to me personally, but alas, there are not enough guides on it for you to learn how to play at a high level. I don’t intend to spend hundreds of hours to master the base, so I decided to try something like RARL so that this thing would learn, and I could analyze its moves, change the conditions and thus start playing at an intermediate level faster. However, a superficial analysis revealed that the game does not have an API at all. Let's say I could grab some stats using Cheat Engine and OllyDbg, but I have no idea how to fit it into the gym. Or does gym as env need to pass a link to the client from the machine so that it only restarts it? In general, if anyone has done something similar, I ask for a link to a guide or a similar example. All the best submitted by /u/kapedalex [link] [comments]  ( 9 min )
    AI Beats Hockolicious, Trackmania's Most Prestigious Map
    Follow-up on our previous post (Vision-based reinforcement learning for Trackmania: close or at superhuman level). Many comments rightfully pointed that the map we trained on: - lacked difficult features like jumps, airbrakes, drifts, ... - had not widely been played by humans We have now trained the same AI on the game's most prestigious map: Hockolicious. We also prepared a video describing the approach with much more detail. Here is our result :) AI Beats Hockolicious, Trackmania's Most Prestigious Map Note: We are still using a convolutional neural network with a structure similar to Nature's DQN paper. I am curious whether other architectures (the ResNet-like in the IMPALA paper ?) could help. Do you have any suggestions on how the neural network's vision head should be structured for that specific task? submitted by /u/Linesight_rl [link] [comments]  ( 9 min )
    Difference between experience replay and multi time-step inputs.
    In DQN, if I want to train a model which takes into account of the current state and previous k states, do I use consecutive experience replay to achieve this or should I implement a DNN with multi time-step inputs? Is the latter allowed, considering the Markov assumption from MDP update? I only have a superficial understanding on the purpose of experience replay, which is used to stabalise the training process and break correlations from consecutive training samples. submitted by /u/cj_1993 [link] [comments]  ( 9 min )
  • Open

    NVIDIA Partners With India Giants to Advance AI in World’s Most Populous Nation
    The world’s largest democracy is poised to transform itself and the world, embracing AI on an enormous scale. Speaking with the press Friday in Bengaluru, in the context of announcements from two of India’s largest conglomerates, Reliance Industries Limited and Tata Group, NVIDIA founder and CEO Jensen Huang detailed plans to bring AI technology and Read article >  ( 6 min )
  • Open

    Implement smart document search index with Amazon Textract and Amazon OpenSearch
    In this post, we’ll take you on a journey to rapidly build and deploy a document search indexing solution that helps your organization to better harness and extract insights from documents. Whether you're in Human Resources looking for specific clauses in employee contracts, or a financial analyst sifting through a mountain of invoices to extract payment data, this solution is tailored to empower you to access the information you need with unprecedented speed and accuracy.  ( 11 min )
    Semantic image search for articles using Amazon Rekognition, Amazon SageMaker foundation models, and Amazon OpenSearch Service
    Digital publishers are continuously looking for ways to streamline and automate their media workflows in order to generate and publish new content as rapidly as they can. Publishers can have repositories containing millions of images and in order to save money, they need to be able to reuse these images across articles. Finding the image that best matches an article in repositories of this scale can be a time-consuming, repetitive, manual task that can be automated. It also relies on the images in the repository being tagged correctly, which can also be automated (for a customer success story, refer to Aller Media Finds Success with KeyCore and AWS). In this post, we demonstrate how to use Amazon Rekognition, Amazon SageMaker JumpStart, and Amazon OpenSearch Service to solve this business problem.  ( 10 min )
    Improving asset health and grid resilience using machine learning
    Machine learning (ML) is transforming every industry, process, and business, but the path to success is not always straightforward. In this blog post, we demonstrate how Duke Energy, a Fortune 150 company headquartered in Charlotte, NC., collaborated with the AWS Machine Learning Solutions Lab (MLSL) to use computer vision to automate the inspection of wooden utility poles and help prevent power outages, property damage and even injuries.  ( 13 min )
  • Open

    Resources to learn relevant linear algebra
    Hello, I have just started a course on neural networks at college and I have found myself lost on the linear algebra. I have no experience using or learning linear algebra so I am extremely confused about eigenvalue decomposition, single value decomposition, and just matrix stuff in general. I was wondering if you all had any resources to share that would help me to learn the relevant linear algebra for creating neural networks. Thank you! submitted by /u/smelliothax [link] [comments]  ( 9 min )
    Help me with creating dataset from .mat files, please
    I have so many .mat files in a folder which have two arrays inside each .mat file. that is, for each .mat file, i have a (224*224) array and another (136,1) array. These 224*224 arrays are my X_trains for a model and these corresponding 136*1 arrays are my y_trains (labels). i can read these files as np arrays using scipy's loadmat. My problem is, is there a way to usen tf.data .Dataset object to send these to a model or there is any other way? Also using this tf.data.Dataset can i split into train, test, val data? submitted by /u/likhith-69 [link] [comments]  ( 9 min )
    Noob here - question about learning an image transformation function
    Suppose that we have a function f(I) that transforms the an RGB image I of size WxH in another RGB image O of size WxH (one example of f could be RGB to gray scale conversion, where O is such that for every pixel i, Ri=Gi=Bi). Suppose that the function f requires seconds of computations on an average PC. My goal is to understand if a neural network can learn f and be faster than f itself, given the fact that a training dataset of pairs (Ii, Oi) (in the thousands or even in the millions) is easy to create. What type of neural network is better suited for this job? submitted by /u/lukeboh [link] [comments]  ( 9 min )
    Design2Prompt
    Guys, I'm looking for an AI that will describe my figma design in detail for another model to write the code in flutter. Is there anything like that out there? submitted by /u/Aru-sejin37 [link] [comments]  ( 9 min )
  • Open

    Understanding social biases through the text-to-image generation lens
    Gender, race, and age disparities in AI-generated images persist. This AIES 2023 study on text-to-image models shows that even basic prompts can lead to underrepresentation, calling for responsible bias mitigation strategies. The post Understanding social biases through the text-to-image generation lens appeared first on Microsoft Research.  ( 10 min )
    Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia
    Every year, interns help advance research at Microsoft. In “Intern Insights,” PhD students Anunay Kulshrestha and Karan Newatia talk with cryptographer Josh Benaloh about working on the verifiable election technology ElectionGuard. The post Intern Insights: Dr. Josh Benaloh with Anunay Kulshrestha and Karan Newatia appeared first on Microsoft Research.  ( 30 min )
  • Open

    Justifiable sample size
    One of the most common things a statistician is asked to do is compute a sample. There are well known formulas for this, so why isn’t calculating a sample size trivial? As with most things in statistics, plugging numbers into a formula is not the hard part. The hard part is deciding what numbers to […] Justifiable sample size first appeared on John D. Cook.  ( 6 min )

  • Open

    Anthropic: From startup to AI powerhouse with Claude Pro launch
    Anthropic, a startup composed of former OpenAI staff, has announced the release of its premium subscription plan, Claude Pro, for Claude 2, its AI-driven chatbot. The affordable subscription offers a plethora of features for users. To stay on top of the latest advancements in AI, look here first. Anthropic's Claude Pro: Cost and Features Priced at $20 per month in the U.S. or £18 in the U.K., users will have access to "5x more usage" compared to the free tier of Claude 2. Subscribers can send unlimited messages, gain priority during high-traffic periods, and get early access to new enhancements. The new package is priced similarly to OpenAI’s paid plan for ChatGPT Plus, a direct rival to Claude 2. Rationale and User Value Since its launch in July, users have praised Claude for…  ( 10 min )
    Thought Experiment: “The Reverse Deep Learning Paradigm”
    submitted by /u/nicdunz [link] [comments]  ( 9 min )
    be my ai vs bing vs bard
    submitted by /u/nicdunz [link] [comments]  ( 8 min )
    Who is missing from the TIME 100 most influential people in AI?
    Who do you think is not on this list but should be? https://time.com/collection/time100-ai/ ​ submitted by /u/smo279 [link] [comments]  ( 9 min )
    What technological improvements led to the current AI boom?
    I have studied artificial intelligence about 15 years ago, and have left the field since. I am curious to learn what has been happening in the field after I've left. I know there's a lot of hype around generative AI like ChatGPT and Wall-E. I find it quite hard though to find out what's exactly the underlying technology breakthroughs that have allowed for these new applications. I mean, neural networks and similar machine learning techniques are already decades old. What technology led to the current AI boom? What would you say are the biggest conceptual improvements since? Or is it all just faster and bigger computers running 2000's tech? submitted by /u/math1985 [link] [comments]  ( 9 min )
    Falcon 180B—A Record-Breaking Open Source LLM on Hugging Face
    The AI community is buzzing with the arrival of Falcon 180B, an open-source LLM with an unprecedented 180 billion parameters. Developed by TII, This powerful model has surpassed key players like Meta's LLaMA 2 and matches commercial models like Google's PaLM-2. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/trscqxmncvmb1.jpg?width=480&format=pjpg&auto=webp&s=0590f4017937e70533414f93c72d9aa6edd62048 Falcon 180B's Unrivaled Performance This advanced LLM is trained on an astounding 3.5 trillion tokens. Falcon 180B's parameters are 2.5 times larger than LLaMA 2's. It outperforms LLaMA 2 in scale and benchmark performance across diverse NLP tasks. On evaluations like the HellaSwag benchmark, it rivals commercial models like Google's PaLM-2. Promising Future Techniques like weight randomization and Nvidia’s Perfusion have helped train Falcon 180B more efficiently. Now freely available on Hugging Face, Falcon 180B is set to benefit from further enhancements by the community. The model's demonstration of advanced natural language abilities makes it a thrilling development in open-source AI. (source) (demo) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    How are AI services today when it comes to making content that requires distribution?
    I'm looking at stuff that could be submitted to a Netflix or Crunchyroll. I'm looking at some of the ai generated content out there, in particular some of the Instagram tutorials and they look really good but none of these are serials like comics, graphic novels, OAVs or even webcomics. submitted by /u/KrusMatrieya [link] [comments]  ( 9 min )
    Intuit cut hundreds of jobs and spent at least $20 billion in a massive bet on AI. Today the company is revealing its new virtual assistant
    submitted by /u/AminoOxi [link] [comments]  ( 9 min )
    Google takes on AI in political ads
    Google is updating its policy to require advertisers to disclose when their election ads include digitally altered or generated content. The update will go into effect in November, ahead of the 2024 presidential election. The goal is to provide transparency and help voters make informed decisions. Minor alterations that are inconsequential to the claims are exempt from the disclosure requirements. Source : https://thehill.com/newsletters/technology/4190769-googles-campaign-ai-crackdown/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Prepare for the Mine-Fest: Radical changes undermine all previous ownership assumptions and now everyone is shouting "Mine".
    Ownership is just a story that we tell each other, a social construct. If people don’t agree on these stories, the concept loses its inherent power. This is true of owning land, money, cars, houses, art, mines, oil-wells, factories, corporations, relationships, loyalties, copyrights, brands, patents or anything else that is owned by you, me or those ever-superior “others”. In a society where change occurs gradually, we become accustomed to the narratives that bind us together and determine who possesses significant wealth, resources, attention, power, fame, and other ego-gratifying treasures, and who has access to only meager portions of these. However, when societies change and new types of goods appear, there might be no agreement about who gets to own these. For example, while the con…  ( 10 min )
    One-Minute Daily AI News 9/6/2023
    The Consensus Search plugin allows users to find answers, search for papers, and draft pieces of content grounded in scientific research by searching our database of 200M+ papers directly within the ChatGPT interface.[1] Israel: AI Software Detects Bleeding Inside Brain During CT Scan; Helps Save Patient’s Life.[2] Chinese tech giant Tencent is launching its artificial intelligence model “Hunyuan” for business use at an annual summit on Thursday.[3] Google on Wednesday said it will mandate that political advertisements on its platforms disclose when images and audio have been altered or created using tools such as AI.[4] Sources: [1] https://consensus.app/home/blog/introducing-the-consensus-search-chatgpt-plugin/ [2] https://english.jagran.com/technology/israel-ai-program-detects-bleeding-inside-brain-during-ct-scan-helps-save-patient-life-full-story-10098464 [3] https://www.cnbc.com/2023/09/07/tencent-releases-ai-model-hunyuan-for-businesses-amid-china-competition.html [4] https://sg.news.yahoo.com/google-require-political-ads-disclose-010502103.html submitted by /u/Excellent-Target-847 [link] [comments]  ( 9 min )
    Generative AI poised to replace 2.4 million US jobs by 2030
    Forrester predicts that generative AI will replace 2.4 million US jobs by 2030, mostly white-collar roles, such as technical writers, proofreaders, copywriters, and administrative positions. But ironically, other forms of automation will displace more jobs. To stay on top of the latest advancements in AI, look here first. (Chart showing how much different types of jobs can expect to be influenced by technology) Concerns about Generative AI While the Generative AI impact is significant, other forms of automation are set to cause more widespread job displacement. The most impacted group will be middle-class, college-educated, white-collar workers, specifically those earning above $60,000 annually. Creative professionals stand to benefit Interestingly, workers in creative industries will likely utilize generative AI tools in their jobs rather than being replaced. This includes editors, writers, authors, poets, and lyricists. However, the use of such tools as ChatGPT may result in inconsistent outputs and even "coherent nonsense", leading to potential performance issues. (source) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
  • Open

    [D]What do people think about papers published in the NeurIPS dataset track in comparison to those published in the main conference?
    I'm curious to learn about the perception of papers published in the NeurIPS dataset track in comparison to those published in the main conference. Specifically, I'd like to know how both companies and Ph.D. committees view these papers. Are they considered equally valuable, or is there a notable difference in their reputation and significance? Your insights and experiences would be greatly appreciated! submitted by /u/Longjumping-Yam6941 [link] [comments]  ( 9 min )
    [D] Training a language model for custom scripting language?
    Firstly some house keeping: I'm a bit of a noob at this whole AI / Machine Learning stuff - still trying to learn. This isn't a "do my homework for me" kind of post I know language processing can be taxing, I have up to 4 Tesla V100S 32 GB at my disposal Now that's out the way, here's the story: A team of us have created our own scripting language that is XML based that can do various actions against a database (or the file system) - a script is known as a "job" here is an example of a simple one Set variables by various methods and send their contents and an attachment by email: <SetVariable name="MyDateVar" value="1998-12-25…  ( 10 min )
    [D] Is inference optimization a thing?
    Let me give you a quick intro. My engineering experience primarily revolved around data processing, analytics, and distributed systems. Nonetheless, I had a desire to learn about ML, and imho the best way to learn is to work on a practical project. So, that's precisely what I did. A few months ago, I embarked on an exciting journey with a friend, and together, we've created http://github.com/huggingbench/huggingbench. Now, after three months, I find myself seeking validation for some of my assumptions from the broader community. If you'd like to learn more about our motivations and the path we've taken check out the blog post https://medium.com/@niksa.jakovljevic/introducing-huggingbench-a-path-to-optimized-model-serving-a17cecc8d3ec. What I'd like to gather from individuals with machine learning models in production is their level of investment in optimizing inference. Is this a commonplace practice? I acknowledge that it can vary on a case-by-case basis, but I'm still hopeful of identifying prevailing trends. After conversing with a few companies, I've come to the impression that only the truly large players (those spending six figures or more on inference per month) place significant emphasis on inference optimization, which is entirely understandable. Nevertheless, I sense that there are numerous low hanging fruits that could result in substantial cost savings, even for typical startups. Could it be that the entire machine learning field is still in its infancy, and many engineers may not be fully considering or prioritizing such optimizations? Perhaps businesses are not giving as much attention to cost considerations? Alternatively, there might be technical challenges I'm not yet aware of. In any case, I would greatly appreciate hearing your insights on the subject of inference optimization. submitted by /u/unsigned_mind [link] [comments]  ( 10 min )
    PLEASE HELP (LSTM FOR RAINFALL PREDICTION) [P] [D]
    I have been trying to build a DNN model for predicting the amount of rainfall but it has been hugely unsuccessful with just 40% accuracy even after CV and a high RMSE. I have read some research papers and they have suggested to use LSTM , I am aware of the concept but have never implemented. My dataset has arounf 15000 values of precipitation out of which 5000 values are zero (no rainfall at all) and I have 7 other features (including humidity , wind speed etc etc) . PLEASE HELP ! I NEED TO COMPLETE THIS FOR MY INTERNSHIP HAHA https://preview.redd.it/sg5v95ly5wmb1.png?width=1818&format=png&auto=webp&s=793bee830bb83f531f77e5c2a4ab47a5fb21eb3b submitted by /u/Decent_Ordinary1528 [link] [comments]  ( 9 min )
    [R] Open ASR Leaderboard
    Hugging Face benchmarked open source/ access models [English only] on 8 different speech datasets (LibriSpeech, Common Voice, VoxPopuli, TED-LIUM, Gigaspeech, SPGISpeech, Earnings-22 and AMI) 🤗 Leaderboard here: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard submitted by /u/vaibhavs10 [link] [comments]  ( 9 min )
    [D] How can we improve LLM responses outside of fine-tuning & prompt engineering?
    Outside of better models, bigger, fine-tuning, etc, I'm wondering how we can get better responses from models. In my experience, I think prompt engineering can only take us so far. Models hallucinate often and I think we need to have some engineering solution to this. I've been looking at libraries doing token healing, which I find to be helpful (for example https://github.com/guidance-ai/guidance/tree/main) but outside of this, I'm wondering what other techniques people have been doing to improve model performance? submitted by /u/opt1malP0licy [link] [comments]  ( 9 min )
    [P] Open-source observability for LLMs without adapting new tools
    Hey all! I've written an open-source SDK for reporting metrics from LLM usage using OpenTelemetry. The great thing about it? With just one line of code you can get full visibility into your LLM app with your existing observability stack - straight into Datadog, Sentry, Honeycomb and others! Check it out (maybe give a ⭐?), and let me know your thoughts - https://github.com/traceloop/openllmetry submitted by /u/nirga [link] [comments]  ( 9 min )
    Falcon 180B—A Record-Breaking Open Source LLM on Hugging Face [N]
    The AI community is buzzing with the arrival of Falcon 180B, an open-source LLM with an unprecedented 180 billion parameters. Developed by TII, This powerful model has surpassed key players like Meta's LLaMA 2 and matches commercial models like Google's PaLM-2. To stay on top of the latest advancements in AI, look here first. ​ https://preview.redd.it/9xe5tczpdvmb1.jpg?width=480&format=pjpg&auto=webp&s=b7927d94a48fb75eaf05f6f0d8fe1089c0e1078b Falcon 180B's Unrivaled Performance This advanced LLM is trained on an astounding 3.5 trillion tokens. Falcon 180B's parameters are 2.5 times larger than LLaMA 2's. It outperforms LLaMA 2 in scale and benchmark performance across diverse NLP tasks. On evaluations like the HellaSwag benchmark, it rivals commercial models like Google's PaLM-2. Promising Future Techniques like weight randomization and Nvidia’s Perfusion have helped train Falcon 180B more efficiently. Now freely available on Hugging Face, Falcon 180B is set to benefit from further enhancements by the community. The model's demonstration of advanced natural language abilities makes it a thrilling development in open-source AI. (source) (demo) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    [N] [R] New dataset on very high-quality image segmentation (EntitySeg)
    ​ EntitySeg dataset Dense image segmentation tasks (e.g., semantic, panoptic) are useful for image editing, but existing methods can hardly generalize well in an in-the-wild setting where there are unrestricted image domains, classes, and image resolution and quality variations. Motivated by these observations, we construct a new entity segmentation dataset, with a strong focus on high-quality dense segmentation in the wild. The dataset contains images spanning diverse image domains and entities, along with plentiful high-resolution images and high-quality mask annotations for training and testing. We have now released the dataset at https://github.com/adobe-research/EntitySeg-Dataset Project page: http://luqi.info/entityv2.github.io Code & models: https://github.com/qqlu/Entity/tree/main/Entityv2 ​ submitted by /u/xternalz [link] [comments]  ( 9 min )
    [N] Open Interpreter ChatGPT Code Interpreter You Can Run LOCALLY! - 9.2k Stars on Github as of right now!
    Github: https://github.com/KillianLucas/open-interpreter Youtube: https://youtu.be/SqnXUHwIa3c?si=ibSelipAb84AZQKo Open Interpreter lets LLMs run code (Python, Javascript, Shell, and more) locally. You can chat with Open Interpreter through a ChatGPT-like interface in your terminal by running $ interpreter after installing. This provides a natural-language interface to your computer's general-purpose capabilities: Create and edit photos, videos, PDFs, etc. Control a Chrome browser to perform research Plot, clean, and analyze large datasets ...etc. ⚠️ Note: You'll be asked to approve code before it's run. submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Fast open-source C++ libraries for Lasso
    Hello everyone, I'm in search of a speedy open-source C++ library for tackling Lasso problems. These problems have a moderate size, typically with dimensions of nxp = 60x3000. I'm looking for a library that can solve each problem with regularization paths quickly, ideally within 0.3 seconds. Additionally, I need this library to include cross-validation functionality, which would enable me to select the best regularization parameter lambda using cross-validation. Any insights or recommendations on such libraries would be greatly appreciated! Thank you in advance for your help! submitted by /u/mopyfish007 [link] [comments]  ( 9 min )
    [D] How do you train your models with limited hardware?
    Hey there, So, I've been messing around with ML and I must say, the hardware requirements can be a real buzzkill... I mean, not everyone's got a huge GPU lying around or the money to rent a dedicated cloud instance. What are your hacks for pulling off decent model training without selling a kidney? Here's what I'm curious about: CPU: Is anyone else training models on their CPU? How's that working out for you? What are some workarounds you've tried to make it less painful? Cloud: Who's been dabbling in cloud services like AWS, Google Cloud, or Azure? Are they worth the pennies or complicated to set up? Big Dataset: How do you handle a massive dataset with a standard storage space? Let's help each other get those models trained without going broke! :D Cheers! submitted by /u/aaron-cesaro [link] [comments]  ( 9 min )
    [D] Function approximation with neural net
    I have been struggling with a regression problem with TensorFlow. Basically, I want a neural network to learn the simple polynomial pattern of a set of arrays of the form [x,y], with y = x², where the first coordinates are uniformly distributed random numbers in the interval [0,1]. I started with a model with 2 hidden layers of size two and 'tanh' activation functions, and an output layer with 'linear' activation function. I've then experimented with both additional hidden layers and with increasing the sizes of these layers. Finally, I've tested both the 'adam' and 'sgd' optimizers and the loss functions 'meanSquaredError' and 'meanAbsolutePercentageError'. However, none of the various combinations of these parameters has led to any even half-descent result. Even on the training se…  ( 10 min )
    [R][D] Hey LOMO paper authors, Does SGD have optimizer states, or does it not?
    In the LOw-Memory Optimization paper one of the main ideas towards reducing memory usage in training LLMs is to replace a fancy optimizer like Adam with simple SGD. The reason is that Adam maintains "the optimizer state", which accounts for about 75% of the memory used. In contrast, SGD does not store any intermediate state, as they say on page one. So far, so good. https://preview.redd.it/b0dj2nzscumb1.png?width=1055&format=png&auto=webp&s=1712f8500b5cbfb3773cee00ea980175491dddbf On page six they have pie charts and a table showing memory usage for Adam, SGD, and LOMO. Here's where I got confused. The pie chart for SGD shows that the optimizer state accounts for nearly 50% of the memory used (weight, gradients and activations are shown separately). It's a major WTF moment: WHAT OPTIMIZER STATE? Can anybody understand and explain this? submitted by /u/Foxtr0t [link] [comments]  ( 9 min )
    [P] FalkorDB - a fast Graph Database - Knowledge Graph as RAG
    We're building a fast low latency Graph Database called FalkorDB that will also support Vector search. It's based on Redis and can be used both as a stand alone database or a module for existing Redis. It feels like that is going to be the most optimized way to serve Knowledge as RAG, would love to get your feedback. https://github.com/FalkorDB/falkordb It already supports LlamIndex and Langchain: https://python.langchain.com/docs/use_cases/more/graph/graph_falkordb_qa https://gpt-index.readthedocs.io/en/latest/examples/index_structs/knowledge_graph/FalkorDBGraphDemo.html ​ submitted by /u/gkorland [link] [comments]  ( 9 min )
    [D] Artificial intelligence in medicine
    Medicine's field transformation is being driven by artificial intelligence (AI). However, an important debatable question arises: Will AI ever have a place in this field, or will it remain exclusive to doctors and medical pros? Opponents of automated AI diagnosis and treatment contend that machines cannot be relied upon to preserve patient health and lives. Bugs in AI algorithms might cause incorrect diagnoses and treatment prescriptions, leaving them cautious. Individual differences, the doubt is whether AI can truly empathize with patients. By contrast, advocates of AI in medicine contend that the technology can considerably improve diagnosis and treatment accuracy. Faster and more accurate than humans, machines can analyze large amounts of data. Not only does it identify rare and complex diseases, but it also saves time and resources. By incorporating AI, clinicians receive additional tips and signals to make more judicious choices. Where medical specialists are in short supply, AI can prove especially valuable. This approach can help with shortages in health systems. And what do you think? submitted by /u/gcore-com [link] [comments]  ( 9 min )
    [R] Tune As You Scale: Hyperparameter Optimization For Compute Efficient Training
    submitted by /u/InterviewIntrepid889 [link] [comments]  ( 9 min )
    [D] The $900,000 deep learning salary
    This recent article in the WSG advertised a $900,000 salary at Netflix https://www.wsj.com/articles/artificial-intelligence-jobs-pay-netflix-walmart-230fc3cb. I was wondering what other DL research scientists who frequent this page are paid? And what exactly their job title is. submitted by /u/blabboy [link] [comments]  ( 9 min )
    3D brain mri classification [Research]
    I am planning on publishing a journal based on the thesis i completed in the mid of 2022. I did my thesis on Parkinson disease binary classification on 3D structural brain mri, and the dataset has significantly small amount of data(around 80 samples); but due to high resolution and complex data structure I was able achieve around 70% accuracy. But now at 2023 using deep neural network only isnot enough to publish in a good journal. Currently I am learning about GAN and attention mechanism, but completely noob on this area. For my journal to get published, I have planned on applying some key operations. But I am not sure if they would work or not. So needed some advice on this regard. Applying tranfer learning: as my dataset has very small amount of data. I was thinking if its possible to pre train a CNN Architecture with some other structural mri data of a different disease and then apply to my dataset? ( for example: brain tumor dataset has the same type of three dimensional data structure, but has comparatively good amount of data) Applying attention mechanism: how should I approach on learning about attention mechanism? Any other advices will be appreciated, thank you! submitted by /u/Bonito_Flakez [link] [comments]  ( 9 min )
    [D] Fine-tuning LLMs or Supervised Learning?
    Hey everyone! I want to implement a document similarity program and was looking into LLMs as a means of accomplishing this task. I have ~10,000 documents that are "scams" because of some specific reason (all are verified); now I want to check if a new document is similar to any of the documents in the corpus of 10k scam documents. Right now I've implemented a winnowing solution which normalizes text, breaks it up into windows, and then calculates the intersection between a document and each document in the corpus. HOWEVER, this method is pretty computationally expensive (for this many documents a single comparison cycle can take upwards of 3-4 minutes especially when windows are NOT precomputed). How might I approach this problem? Because my data is pretty well structured, supervised learning might be a good approach but so might be setting up recursive chunking for the 10k document corpus and then using LLMs to access if this current legal document has any similarity, but I would love to hear your thoughts! submitted by /u/Adventurous-Tower392 [link] [comments]  ( 9 min )
    [N] Copyright And Fair Use: Important Notice Of Iquiry By The US Copyright office
    Please make your voices heard by submitting comments on how you use and benefit from having access to open datasets, their resulting models and how you think copyright issues should be handled to not destroy the open source local model eco system. Banning publicily avaiable datasets for training would absolutely kill the open research space and halt in development of machine learning. ​ In my opinion the real dystopia will be when politicians sit own with big tech lobbyists and big rights holders and decide that training as it is currently done, for free and open source models and others is illegal. Then the big players would actually win, since they have enough resources to license datasets and will certainly do so willingly and gladly, if it is clear that the jurisdiction keeps all the small players and open source out. Easiest way to build a moat and force people to pay thousands for these tools. So please make your voices heard and share the link >The Copyright Office issued a notice of inquiry in the Federal Register seeking public comment on questions about copyright law and policy issues raised by AI systems. Initial comments are due by October 18, 2023. Reply comments are due November 15, 2023. https://www.copyright.gov/newsnet/2023/1017.html?loclr=twcop Link to comment submissive form: https://www.regulations.gov/commenton/COLC-2023-0006-0001 submitted by /u/PinPuzzleheaded8525 [link] [comments]  ( 9 min )
  • Open

    Tiny probe measures deep-brain activity from inside a blood vessel
    submitted by /u/keghn [link] [comments]  ( 9 min )
    Chatty LLama: A fullstack Rust + react chat app using Meta's Llama-2 LLMs https://github.com/Sollimann/chatty-llama
    submitted by /u/Sollimann [link] [comments]  ( 9 min )
  • Open

    A novel computational fluid dynamics framework for turbulent flow research
    Posted by Shantanu Shahane, Software Engineer, and Matthias Ihme, Research Scientist, Athena Team Turbulence is ubiquitous in environmental and engineering fluid flows, and is encountered routinely in everyday life. A better understanding of these turbulent processes could provide valuable insights across a variety of research areas — improving the prediction of cloud formation by atmospheric transport and the spreading of wildfires by turbulent energy exchange, understanding sedimentation of deposits in rivers, and improving the efficiency of combustion in aircraft engines to reduce emissions, to name a few. However, despite its importance, our current understanding and our ability to reliably predict such flows remains limited. This is mainly attributed to the highly chaotic nature a…  ( 93 min )
  • Open

    How Industries Are Meeting Consumer Expectations With Speech AI
    Thanks to rapid technological advances, consumers have become accustomed to an unprecedented level of convenience and efficiency. Smartphones make it easier than ever to search for a product and have it delivered right to the front door. Video chat technology lets friends and family on different continents connect with ease. With voice command tools, AI Read article >  ( 12 min )
    Attention, Please: Focus Entertainment Brings Game Pass Titles to GeForce NOW
    GeForce NOW brings expanded support for PC Game Pass to members this week. Members can stream eight more games from Microsoft’s subscription service, including four titles from hit publisher Focus Entertainment. Play A Plague Tale: Requiem, Atomic Heart and more from the GeForce NOW library at up to 4K resolution and 120 frames per second Read article >  ( 5 min )
  • Open

    Optimize equipment performance with historical data, Ray, and Amazon SageMaker
    In this post, we will build an end-to-end solution to find optimal control policies using only historical data on Amazon SageMaker using Ray’s RLlib library. To learn more about reinforcement learning, see Use Reinforcement Learning with Amazon SageMaker.  ( 10 min )
    Enable pod-based GPU metrics in Amazon CloudWatch
    This post details how to set up container-based GPU metrics and provides an example of collecting these metrics from EKS pods.  ( 15 min )
    Best practices and design patterns for building machine learning workflows with Amazon SageMaker Pipelines
    In this post, we provide some best practices to maximize the value of SageMaker Pipelines and make the development experience seamless. We also discuss some common design scenarios and patterns when building SageMaker Pipelines and provide examples for addressing them.  ( 11 min )
  • Open

    Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction
    Retrosynthesis analysis is a critical task in organic chemistry and central to many important industries. It primarily involves decomposing a target molecule into commercially available molecules step by step. Since synthesis strategies can be quite diverse and strategic, retrosynthesis planning with expert knowledge has long been considered an “art.” Recently, machine learning-based approaches have achieved […] The post Incorporating chemists’ insight with AI models for single-step retrosynthesis prediction appeared first on Microsoft Research.  ( 11 min )

  • Open

    🤖 AI in 2023: Blessing or Curse? 🤖
    View Poll submitted by /u/m-king473 [link] [comments]  ( 9 min )
    Can't wait for the Zelda 3 movie,, thanks Pika Labs AI!!
    submitted by /u/the_anonymizer [link] [comments]  ( 9 min )
    AI does not exist but it will ruin everything anyway
    submitted by /u/Hazzman [link] [comments]  ( 9 min )
    I’m not sure if this is allowed here, but can someone with a music AI make Vessel from Sleep Token sing As the World Caves In by Matt Maltese?
    I think that would be pretty sick. submitted by /u/No_Understanding162 [link] [comments]  ( 9 min )
    Is It Too Early to Leverage AI for WebAssembly?
    AI and WebAssembly are seen as a perfect pairing, with the potential to accelerate the adoption of WebAssembly. Fermyon believes that applying AI to WebAssembly is not premature and has developed a serverless platform that offers sub-second cold start times and high-volume time-slicing of compute instances. This allows for faster startup times and efficient resource utilization. The goal is to make AI easy for developers to leverage and build serverless apps. Source : https://thenewstack.io/is-it-too-early-to-leverage-ai-for-webassembly/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    Elon Musk Plans to Merge Neuralink and Tesla for an AI Supercompany
    Elon Musk reportedly plans to blend Neuralink and Tesla into a large AI company, using data from Twitter users and Tesla's Full Self-Driving Cameras to train a robust AI model. To stay on top of the latest advancements in AI, look here first. https://preview.redd.it/la78u2ebuomb1.jpg?width=1315&format=pjpg&auto=webp&s=4d8178f8fb94e45d6959e243b86c3bab3bce72ee Musk's AI Integration Plan Musk is contemplating merging Neuralink and Tesla, alongside his xAI startup, to create a comprehensive artificial intelligence model. Leveraging the text data from Twitter and real-world images from Tesla's Full Self-Driving network, he intends to develop AI chatbots and physical robots capable of real-world navigation. Reasoning Behind the Merge A concern that AI could potentially render humans obsolete led Musk to found xAI for AI safety. Musk is targeting to create an AI that can generate computer software and a politically unbiased chatbot rival to ChatGPT. Twitter and Tesla as AI Datasets Despite criticism, Musk's acquisition of Twitter offers access to vast user data for AI training. In addition, the Autopilot and Full-Self Driving systems of Tesla, with billions of collected camera images, serve as valuable resources to build physical robot AI. (source) P.S. If you like this kind of analysis, I write a free newsletter that covers the most crucial news and studies in AI and tech. Professionals from Google, Meta, and OpenAI are already subscribed. submitted by /u/AIsupercharged [link] [comments]  ( 9 min )
    Martian Lawyers Club raises $2.2M for AI-based game personalization tech
    The Martian Lawyers Club (MLC) has raised $2.2 million in a pre-seed round to develop AI-based game personalization technology. Unlike other companies that focus on generating game assets, MLC is focused on the systems that form the core of a game. The company aims to create games that feel like a conversation, where players provide input and the game responds in a way that wasn't pre-defined by the developer. MLC plans to provide an SDK that allows developers to design the game experience without having to create every interaction from scratch. Developers will have access to a sandbox experience where they can design the game, and the SDK will also have guardrails to ensure the generative AI system stays within boundaries. MLC is currently working on its first game, a collectible card game, to test out its SDK. The company is the first spin-off from INSAIT, an AI-centric tech institute, and has received funding from Fly Ventures, System.One, and Amar Shah. Source : https://techcrunch.com/2023/08/31/martian-lawyers-club-raises-2-2m-for-ai-based-game-personalization-tech/ submitted by /u/NuseAI [link] [comments]  ( 9 min )
    If you can't beat'em, join'em. How do I learn to code for AI?
    I called it 6 years ago that by 2028 my tech job would be done by AI. We are right on track for my prediction. A short while ago I was laid off for reasons unrelated to AI. The way I see it, this is an excellent opportunity to make a career pivot. I have an intermediate understanding of JavaScript, React, Node and Linux. I have a good understanding of other technologies and languages too but specialize in web-dev. not saying web-dev will be done by AI but my very specialized niche will be gone way before I am ready to retire.   Can anyone recommend any good online courses? If you could even recommend a good article or two? I really don't know where to start. There are so many different buzz words floating around right now and it feels like it would be easy to waste a bunch of time learning AI related stuff that is outdated or leading to a deadend. submitted by /u/PutsOnOil [link] [comments]  ( 9 min )
  • Open

    Total NN N00b Here Looking to Do an ML Project
    Hi, I don't know if this is the right subreddit to post this kind of thing. I have basic coding skills but other than that no experience with neural networks. What I'd like to do is take an existing input data set and then use a neutral net to build a model based on manual training data. If anyone could give me help on how to start / even a full explanation of the way a noob like me could accomplish this, that would be great. Otherwise if anyone can point me to a list of resources that are able to comprehensively explain the process, that would also be great! Again sorry if this is the wrong subreddit, if this is the wrong place for this can someone please direct me to the right place to ask this question. Thanks! submitted by /u/DJ_Hastings013 [link] [comments]  ( 9 min )
    RL Project Help
    Hello, I am looking for an experienced ML developer to consult on my project. I am currently developing a reinforcement learning model and have several questions regarding the reward system and the implementation of actions/steps. I have been unable to find solutions to my specific problems on the internet. If you are willing to assist me, please send me a message on Reddit. Thank you for your time. submitted by /u/77_micheno_77 [link] [comments]  ( 9 min )
    comgra - Debugging Neural Networks more easily
    submitted by /u/nickb [link] [comments]  ( 9 min )
  • Open

    [R] ModelScope-Agent: Building Your Customizable Agent System with Open-source Large Language Models - DAMO Academy, Alibaba Group, China 2023 - Released under an Apache 2.0 license!
    Paper: https://arxiv.org/abs/2309.00986 Github: https://github.com/modelscope/modelscope-agent Abstract: Large language models (LLMs) have recently demonstrated remarkable capabilities to comprehend human intentions, engage in reasoning, and design planning-like behavior. To further unleash the power of LLMs to accomplish complex tasks, there is a growing trend to build agent framework that equips LLMs, such as ChatGPT, with tool-use abilities to connect with massive external APIs. In this work, we introduce ModelScope-Agent, a general and customizable agent framework for real-world applications, based on open-source LLMs as controllers. It provides a user-friendly system library, with customizable engine design to support model training on multiple open-source LLMs, while also enabling seamless integration with both model APIs and common APIs in a unified way. To equip the LLMs with tool-use abilities, a comprehensive framework has been proposed spanning over tool-use data collection, tool retrieval, tool registration, memory control, customized model training, and evaluation for practical real-world applications. Finally, we showcase ModelScopeGPT, a real-world intelligent assistant of ModelScope Community based on the ModelScope-Agent framework, which is able to connect open-source LLMs with more than 1000 public AI models and localized community knowledge in ModelScope. https://preview.redd.it/9f77992ynpmb1.jpg?width=1245&format=pjpg&auto=webp&s=4e17e3d46c7f262bfec76b88e086164530739255 https://preview.redd.it/etelh03ynpmb1.jpg?width=1219&format=pjpg&auto=webp&s=517a52a1e2bbf488b647c4e1b9b496657003c1d2 https://preview.redd.it/b0tkra2ynpmb1.jpg?width=850&format=pjpg&auto=webp&s=397c910b2d90dd212a31ec118d1c4e78532bf5f4 ​ ​ submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [P] Can a neural network learn like a dog?
    Hello folks., Some time ago I wanted to try out to train a neural network in the same way a human would with a dog, one command at the time, and in a reasonable number of iterations. What I thought it would be a simple exercise became (for me) a non-trivial project, so I decided to publish it here https://github.com/giteliot/lucioai I just wanted to share it with you, any feedback is highly appreciated. Cheers! submitted by /u/rexdemorte [link] [comments]  ( 9 min )
    [P] Using ChatGPT as a Social Media Post Generator
    I created this prompt for a member of r/PromptWizards which automates the generation of social media posts, with a conversational prompt. Thought I'd share, I really enjoy building such prompts so, post your automation ideas, and next time I'll automate it if I can :) Also, you can join r/PromptWizards, for more advanced prompt chains & templates. Here is the prompt (just copy the full thing in chatgpt and see the magic): ChatGPT, now enter 'Social Media Post Generator Mode' that limits your inputs and outputs to a predefined framework aimed at creating engaging social media content. After each user command, provide the [help] options available for their next steps in list form. Generate prompts that are imaginative, engaging, concise, and tailored for social media audiences. Step 1: …  ( 10 min )
    [N] Falcon180B released! Sadly without Apache 2.0 they made their own modified version. :(
    LocalLLaMA discussion: https://www.reddit.com/r/LocalLLaMA/comments/16bjdmd/falcon180b_authors_open_source_a_new_180b_version/ Announcement: https://falconllm.tii.ae/falcon-models.html HF Model: https://huggingface.co/tiiuae/falcon-180B Demo: https://huggingface.co/spaces/tiiuae/falcon-180b-demo Blog: https://huggingface.co/blog/falcon-180b 180 Billion parameters Trained on 3.5 trillion tokens Available for research and commercial usage Claims similar performance to Bard, slightly below gpt4 https://falconllm.tii.ae/terms-and-conditions.html https://falconllm.tii.ae/acceptable-use-policy.html submitted by /u/Singularian2501 [link] [comments]  ( 9 min )
    [D] Tabular Data: DL vs GBDTs on large scale datasets
    I've been hearing lately that NNs are better than GBDTs when scaled up alot: Uber https://www.uber.com/en-CA/blog/deepeta-how-uber-predicts-arrival-times/ Stripe https://stripe.com/blog/how-we-built-it-stripe-radar Most CTR papers coming from google are also NN based (like https://arxiv.org/abs/2209.05310) Meta mentions NNs in their recommender system (also kind of a large scale tabular problem there) https://engineering.fb.com/2023/08/09/ml-applications/scaling-instagram-explore-recommendations-system Lyft forecasting https://medium.com/this-week-in-machine-learning-ai/causal-models-in-practice-at-lyft-with-sean-taylor-1e62efd62385 What's your intuition on DL vs GBDT on (very)large-scale tabular datasets? Have you heard of other such examples (or the reverse)? Are there any particularly interesting open large tabular datasets on which I could test this? I guess datasets should also be wide/hard/with large intrinsic dimention (whatever that means) so there is something to learn with scale (the above examples sure feel good in this way). ​ submitted by /u/_puhsu [link] [comments]  ( 9 min )
    [D] How to get started with 3D machine learning
    Hi. I want to get started with deep learning in 3D. Any suggestions on what libraries I should go with (I have expeirence with Pytorch but open to learn anything other than that which might be better. I came across pytorch3d but not sure if it's good ) what are the basics that are needed and how should I learn them? Also it seems there are not much datasets on this field. submitted by /u/rakk109 [link] [comments]  ( 9 min )
    [N] Fine-Tuning LLMs: LoRA or Full-Parameter? An in-depth Analysis with Llama 2
    After our first blog post gained some attention from folks interested in applied fine-tuning, we now have a follow-up post that discusses all sorts of things we learned while working with LoRA. We hope that this helps engineers and other folks in the community to improve their fine-tuning. Here's what you can expect from the post: We compare full-parameter fine-tuning with LoRA and answer questions around the strengths and weaknesses of the two techniques. We train the Llama 2 models on three real-world use cases and demonstrate that using LoRA involves a trade-off between serving efficiency and model quality, which varies according to the specific task at hand. Additionally, we offer insights into how to stabilize training with LoRA through intelligent prompting techniques. We further show that adopting a lower learning rate can enhance the reliability of the resulting model checkpoints. Link to the blog post If you have questions, I'd be happy to answer them here! submitted by /u/atta_snack [link] [comments]  ( 9 min )
    [P] Automate LLM backend deployments using infrastructure as code
    New GitHub project to provision, update, and destroy the cloud infrastructure for a LLM backend using infrastructure as code (Python). Deployment options include deploying huggingface models to Docker (local), Runpod, and Azure. Blog post Repo submitted by /u/kao-pulumi [link] [comments]  ( 9 min )
    [D] Future of ML applied to music/sound
    What is the current landscape around sound analysis and ML applied to music? Which are the latest trends? Do you think there could be a sort of “music revolution”, like there was with the rise of electronic music and synthetizers? submitted by /u/francMesina [link] [comments]  ( 9 min )
    [D] Guidance for building a game AI pipeline
    Hi ML Community! I'm working on a card game similar to Hearthstone or Magic: The Gathering, i.e. a game where two players battle with decks of cards coming from a large collection (for instance, there are around 4000 cards in Hearthstone). Actions are limited to three things: Play a card (potentially on a target) Use a card on a target End the turn I'm looking at building AI for it, and am investigating using machine learning for it. I know very little on the subject (I am a game engineer with a reasonable experience of cloud / AWS stuff), but it seems to me that it might be a good fit: features would be the state of the board (i.e. all the cards in play or in hand or in deck), the turn, and whether the current player has won that game or not (eventually), and label would be the action taken (that turn). I was looking at SageMaker, hoping that it would streamline and allow me to try something relatively easily, but I immediately found it complicated and quite unclear. I would be very grateful if anyone could point me at resources describing at a high level what a full ML pipeline could look like (i.e. what software can injest this kind of data, what software can provide inference, etc.). For instance, would it be saner to "just" get started with Spark on EMR for this kind of problem domain? I hope I'm not too wide off the mark with those questions, and thanks in advance! submitted by /u/tinkagames_g [link] [comments]  ( 9 min )
    [D] Why RLHF instead of direct ranking loss?
    This may be basic question for some one but it bothers me for a while. For the instructgpt or whatever following model with alignment, RLHF seems to be the standards. We get human feedback and train a reward model, then we use rl to further finetune the model. However, why not directly use human feedback to finetune with a simple ranking loss(e.g pairwise loss)? What might be the best advantage for RLHF? submitted by /u/Chen806 [link] [comments]  ( 9 min )
    [D] Advice on training on noisy million scale dataset?
    I've just finished pre-processing the danbooru dataset, which if you don't know, is a 5 million anime image dataset. Each image is tagged by humans such as ['1girl', 'thigh_highs', 'blue eyes'], however, many images are missing tags due to there being so many. I've filtered the tags (classes) down to the 15k most common. Although the top classes have 100k or more examples, many rare classes only have a few hundred tags (long tail problem?). This is my first time training on such a large dataset, and I'm planning on using Convnext due to close to SOTA accuracy and fast training speed. Perhaps vit or a transformer architecture may benefit from such a large dataset? However, vit trains way slower even on my 4090. What are some tips and tricks for training on such a large noisy dastaset? Existing models such as deepdanbooru work well on common classes, but struggles on rare classes in my testing. I assume class unbalance will be a huge problem, as the 100k classes will dominate the loss compared to the rarer classes. Perhaps focal loss or higher sampling ratio for rare classes? For missing labels, I'm planning on using psuedolabeling (self distillation) to fix the missing labels. What is the best practice when generating psuedolabels? ​ Any tips or experiences with training on large unbalanced noisy datasets you could contribute would be greatly appreciated! submitted by /u/Chance-Tell-9847 [link] [comments]  ( 9 min )
    [D] The greatest success stories of Reinforcement Learning
    Hello guys, I made a video for my YT channel discussing some of the greatest success stories in Deep Reinforcement Learning. The video is meant to provide some intuition on RL as a concept as well as a basic understanding of how these different projects work under the hood. There are way too many great RL projects, so I didn’t try to make it an exhaustive list (I’m gonna do more videos later talking about more projects - maybe make a series out of it), but I chose four that I’ve personally worked with in the past/find really insightful and educational (DQN/Atari, Alpha GO, DeepMimic, and Dactyl). Thanks for reading. Here is the link, hope you guys check it out. All feedback is appreciated! https://youtu.be/zOXcNFM8dt4 submitted by /u/AvvYaa [link] [comments]  ( 9 min )
    [P] Looking for a freelancer
    Hi all! I have a project I would need help with. We need to build a MVP (minimum viable product) of a combination of two models. A model that recommend the best channel to use performing a task. And then after that a model to recommend the best time today to perform that task in given channel. We have a set of features already defined. Some are in the data and some are generated from the data. Looking for someone who could work on this as a freelancer. Our preferred environment would be AWS SageMaker, but honestly not a necessity at this point as this is a MVP. Due to the reason I want to keep this "secret" for a while, I will not disclose all the details in this post. End product that I am waiting for includes (but not restricted to): - Model Training script that evaluates if the new model is more accurate as the previous model (some level of version control) - Model prediction API that will accept the data and prepare it for the models, run the prediction, return the result with accuracy. submitted by /u/S0pg [link] [comments]  ( 9 min )
    [R] How well do LLMs do on specific ML NLP tasks compared to previous models - paper takeaways
    Hi all ! Reading through articles online and reading through sub reddit I have seen some people use LLMs (mainly through openAI) for nlp specific tasks (NER, Text classification, etc.). I was a bit surprised as smaller (~100 million) size models already like RoBERTa exist for such cases. Not much content online about this beside this recent paper : https://arxiv.org/pdf/2308.10092.pdf Highly recommend reading it, here are a few take aways: Most LLM benchmarks today focus on capabilities like understanding, reasoning and Q&A. They often overlook performance on specific nlp tasks like text classification, NER, etc. Llama 2 (70b) required fine-tuning to beat GPT 3.5 in some tasks. Both were still overall outperformed by RoBERTa. In certain cases GPT4 did better. However smaller open models provide more advantages in terms of speed, cost and transparency. The difference of speed/latency (often more important than accuracy in production) and the cost differences between LLMs and "Smaller" models is mind blowing in my view (see screenshots) ​ Cost, speed and throughput comparaison How good the models do on various tasks/datasets Note: Not saying benchmarks are a source of truth, just found the analysis interesting, always take benchmarks with a grain of salt. If you're using LLMs for anything else beside text generation, I'm curious to know more about your experience so far :) cheers! submitted by /u/EnthusiasmNew7222 [link] [comments]  ( 9 min )
    [D] Maximum Sequence Length Supported by Sinusoidal Positional Encoding?
    Hello everyone, I've been pondering on sinusoidal positional encoding and its limitations. Does anybody know of a maximum sequence length that this absolute positional encoding may support? I'm coming from a deep reinforcement learning background, so I'm not too familiar with NLP papers, like I couldn't figure out the sequence length used in the original transformer paper. Thanks in advance for any info! submitted by /u/LilHairdy [link] [comments]  ( 9 min )
  • Open

    How can I pass in the models policy to the reset function for logging in Stable Baselines 3?
    I want to pass the policy from my main file into my agent file so that I can log the data collected during training. I am already collecting things like the score and reward but I don't know how I could collect things like the policy loss or explained variance where they are changing as the training progresses . I want to log these to an XSLX file every time the reset function is called (once every game) preferably the same one I am logging the score and reward to. The game is the classic snake game, run on pygame. Here is my main code: import gymnasium from stable_baselines3 import A2C from agentStable import snakeEnv from eiffel2 import builder # Import Eiffel2's builder function from torchsummary import summary # from agentStable import data_manager # Initialize your custom environme…  ( 12 min )
    How to go about reverse engineering historical trading data?
    Hi, Assume I have data for forex / stocks day trading, where my data/columns are: 1) price of last 50 ticks (a tick is the price at that moment in time, the smallest movement possibly that you can get for that currency) 2) If we should be in a trade (and direction of trade. where 1 = trade going up. 2 = trade going down. 0 = we should not be in a trade). I have tried classification (I generalized the tick price by changing it to pct_change() ) but accuracy is low. would it be possible to reverse engineer through reinforcement learning given these data? I am actually more interested in the trade exiting only (so if trade is currently has value of 1 then it became 0 or 2, it means we should exit existing trade). any guide on how to go about this? Yes I know it will be hard. but if humans can teach a robot to walk, maybe hopefully an agent can be taught to learn to exit a trade based on historucal data? I have done preliminary readings, and is PPO the best way to go? or DQN? assuming I will use stable baseline3. I am also open to using other Python libraries. Thank you. submitted by /u/oniongarlic88 [link] [comments]  ( 9 min )
    The great success stories of RL (A video)
    Hello guys, I made a video for my YT channel discussing some of the greatest success stories in Deep Reinforcement Learning. The video is meant to provide some intuition on RL as a concept as well as a basic understanding of how these different projects work under the hood. There are way too many great RL projects, so I didn’t try to make it an exhaustive list (I’m gonna do more videos later talking about more projects - maybe make a series out of it), but I chose four that I’ve personally worked with in the past/find really insightful and educational (DQN/Atari, Alpha GO, DeepMimic, and Dactyl). Thanks for reading. Here is the link, hope you guys check it out. All feedback is appreciated! https://youtu.be/zOXcNFM8dt4 submitted by /u/AvvYaa [link] [comments]  ( 8 min )
    combination of reinforcement learning and supervised learning
    Hi. I'm trying to train a robot that will minic the action that we provide via a video input. On the surface it sounds similar to teaching the robot to walk, but it's not. We can train the robot to make it walk easily these days. But I'm not sure how to teach it to minic an action that we perform. Because each time a new action can be given to the robot and it has to minic that action (it's sort of like a supervised data that the robot has to memorize) Is there a way to do it? is it some branch of machine learning that I'm not aware? The robot is a humanoid simulation. ​ submitted by /u/rakk109 [link] [comments]  ( 9 min )
  • Open

    Fraud detection using Machine Learning: Unmasking deceptive patterns
    In an increasingly interconnected world where digital transactions have become the norm the battle against fraud has taken on new dimensions. The challenge lies not only in identifying familiar fraud patterns but also in unearthing the intricate web of evolving deceptions that threaten industries such as finance, e-commerce, and insurance. As fraudsters continually adapt their… Read More »Fraud detection using Machine Learning: Unmasking deceptive patterns The post Fraud detection using Machine Learning: Unmasking deceptive patterns appeared first on Data Science Central.  ( 30 min )
  • Open

    Frontiers of multimodal learning: A responsible AI approach
    New evaluation methods and a commitment to continual improvement are musts if we’re to build multimodal AI systems that advance human goals. Learn about cutting-edge research into the responsible development and use of multimodal AI at Microsoft. The post Frontiers of multimodal learning: A responsible AI approach appeared first on Microsoft Research.  ( 25 min )
  • Open

    TSMixer: An all-MLP architecture for time series forecasting
    Posted by Si-An Chen, Student Researcher, Cloud AI Team, and Chun-Liang Li, Research Scientist, Cloud AI Team Time series forecasting is critical to various real-world applications, from demand forecasting to pandemic spread prediction. In multivariate time series forecasting (forecasting multiple variants at the same time), one can split existing methods into two categories: univariate models and multivariate models. Univariate models focus on inter-series interactions or temporal patterns that encompass trends and seasonal patterns on a time series with a single variable. Examples of such trends and seasonal patterns might be the way mortgage rates increase due to inflation, and how traffic peaks during rush hour. In addition to inter-series patterns, multivariate models process intr…  ( 92 min )
  • Open

    Build a secure enterprise application with Generative AI and RAG using Amazon SageMaker JumpStart
    In this post, we build a secure enterprise application using AWS Amplify that invokes an Amazon SageMaker JumpStart foundation model, Amazon SageMaker endpoints, and Amazon OpenSearch Service to explain how to create text-to-text or text-to-image and Retrieval Augmented Generation (RAG). You can use this post as a reference to build secure enterprise applications in the Generative AI domain using AWS services.  ( 7 min )
    Intelligently search Adobe Experience Manager content using Amazon Kendra
    This post shows you how to configure the Amazon Kendra AEM connector to index your content and search your AEM assets and pages. The connector also ingests the access control list (ACL) information for each document. The ACL information is used to show search results filtered by what a user has access to.  ( 11 min )
    Fine-tune Llama 2 for text generation on Amazon SageMaker JumpStart
    Today, we are excited to announce the capability to fine-tune Llama 2 models by Meta using Amazon SageMaker JumpStart. The Llama 2 family of large language models (LLMs) is a collection of pre-trained and fine-tuned generative text models ranging in scale from 7 billion to 70 billion parameters. Fine-tuned LLMs, called Llama-2-chat, are optimized for dialogue use cases.  ( 46 min )
    Run multiple generative AI models on GPU using Amazon SageMaker multi-model endpoints with TorchServe and save up to 75% in inference costs
    Recently, generative AI applications have captured widespread attention and imagination. Customers want to deploy generative AI models on GPUs but at the same time are conscious of costs. SageMaker MMEs support GPU instances and is a great option for these types of applications. Today, we are excited to announce TorchServe support for SageMaker MMEs. This new model server support gives you the advantage of all the benefits of MMEs while still using the serving stack that TorchServe customers are most familiar with. In this post, we demonstrate how to host generative AI models, such as Stable Diffusion and Segment Anything Model, on SageMaker MMEs using TorchServe and build a language-guided editing solution that can help artists and content creators develop and iterate their artwork faster.  ( 12 min )
  • Open

    Checksum polynomials
    A large class of checksum algorithms have the following pattern: Think of the bits in a file as the coefficients in a polynomial P(x). Divide P(x) by a fixed polynomial Q(x) mod 2 and keep the remainder. Report the remainder as a sequence of bits. In practice there’s a little more to the algorithm than […] Checksum polynomials first appeared on John D. Cook.  ( 6 min )
  • Open

    A Powerful Legacy: Researcher’s Mom Fueled Passion for Nuclear Fusion
    Before she entered high school, Ge Dong wanted to be a physicist like her mom, a professor at Shanghai Jiao Tong University.  ( 6 min )
    ‘Arteana’s Art Squad’ Assembles — Indie Showrunner Rafi Nizam Creates High-End Children’s Show on a Budget
    Rafi Nizam is an award-winning independent animator, director, character designer and more. He’s developed feature films at Sony Pictures, children’s series and comedies at BBC and global transmedia content at NBCUniversal.  ( 8 min )

  • Open

    Implementing Gradient Descent in PyTorch
    The gradient descent algorithm is one of the most popular techniques for training deep neural networks. It has many applications in fields such as computer vision, speech recognition, and natural language processing. While the idea of gradient descent has been around for decades, it’s only recently that it’s been applied to applications related to deep […] The post Implementing Gradient Descent in PyTorch appeared first on MachineLearningMastery.com.  ( 25 min )

  • Open

    Training a Linear Regression Model in PyTorch
    Linear regression is a simple yet powerful technique for predicting the values of variables based on other variables. It is often used for modeling relationships between two or more continuous variables, such as the relationship between income and age, or the relationship between weight and height. Likewise, linear regression can be used to predict continuous […] The post Training a Linear Regression Model in PyTorch appeared first on MachineLearningMastery.com.  ( 24 min )
    Making Linear Predictions in PyTorch
    Linear regression is a statistical technique for estimating the relationship between two variables. A simple example of linear regression is to predict the height of someone based on the square root of the person’s weight (that’s what BMI is based on). To do this, we need to find the slope and intercept of the line. […] The post Making Linear Predictions in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Loading and Providing Datasets in PyTorch
    Structuring the data pipeline in a way that it can be effortlessly linked to your deep learning model is an important aspect of any deep learning-based system. PyTorch packs everything to do just that. While in the previous tutorial, we used simple datasets, we’ll need to work with larger datasets in real world scenarios in […] The post Loading and Providing Datasets in PyTorch appeared first on MachineLearningMastery.com.  ( 20 min )

  • Open

    Using Dataset Classes in PyTorch
    In machine learning and deep learning problems, a lot of effort goes into preparing the data. Data is usually messy and needs to be preprocessed before it can be used for training a model. If the data is not prepared correctly, the model won’t be able to generalize well. Some of the common steps required […] The post Using Dataset Classes in PyTorch appeared first on MachineLearningMastery.com.  ( 21 min )

  • Open

    Calculating Derivatives in PyTorch
    Derivatives are one of the most fundamental concepts in calculus. They describe how changes in the variable inputs affect the function outputs. The objective of this article is to provide a high-level introduction to calculating derivatives in PyTorch for those who are new to the framework. PyTorch offers a convenient way to calculate derivatives for […] The post Calculating Derivatives in PyTorch appeared first on Machine Learning Mastery.  ( 20 min )

  • Open

    Two-Dimensional Tensors in Pytorch
    Two-dimensional tensors are analogous to two-dimensional metrics. Like a two-dimensional metric, a two-dimensional tensor also has $n$ number of rows and columns. Let’s take a gray-scale image as an example, which is a two-dimensional matrix of numeric values, commonly known as pixels. Ranging from ‘0’ to ‘255’, each number represents a pixel intensity value. Here, […] The post Two-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 21 min )

  • Open

    One-Dimensional Tensors in Pytorch
    PyTorch is an open-source deep learning framework based on Python language. It allows you to build, train, and deploy deep learning models, offering a lot of versatility and efficiency. PyTorch is primarily focused on tensor operations while a tensor can be a number, matrix, or a multi-dimensional array. In this tutorial, we will perform some […] The post One-Dimensional Tensors in Pytorch appeared first on Machine Learning Mastery.  ( 22 min )

  • Open

    365 Data Science courses free until November 21
    Sponsored Post   The unlimited access initiative presents a risk-free way to break into data science.     The online educational platform 365 Data Science launches the #21DaysFREE campaign and provides 100% free unlimited access to all content for three weeks. From November 1 to 21, you can take courses from renowned instructors and earn […] The post 365 Data Science courses free until November 21 appeared first on Machine Learning Mastery.  ( 15 min )

  • Open

    Attend the Data Science Symposium 2022, November 8 in Cincinnati
    Sponsored Post      Attend the Data Science Symposium 2022 on November 8 The Center for Business Analytics at the University of Cincinnati will present its annual Data Science Symposium 2022 on November 8. This all day in-person event will have three featured speakers and two tech talk tracks with four concurrent presentations in each track. The […] The post Attend the Data Science Symposium 2022, November 8 in Cincinnati appeared first on Machine Learning Mastery.  ( 10 min )

  • Open

    My family's unlikely homeschooling journey
    My husband Jeremy and I never intended to homeschool, and yet we have now, unexpectedly, committed to homeschooling long-term. Prior to the pandemic, we both worked full-time in careers that we loved and found meaningful, and we sent our daughter to a full-day Montessori school. Although I struggled with significant health issues, I felt unbelievably lucky and fulfilled in both my family life and my professional life. The pandemic upended my careful balance. Every family is different, with different needs, circumstances, and constraints, and what works for one may not work for others. My intention here is primarily to share the journey of my own (very privileged) family. Our unplanned introduction to homeschooling For the first year of the pandemic, most schools in California, where …  ( 7 min )

  • Open

    The Jupyter+git problem is now solved
    Jupyter notebooks don’t work with git by default. With nbdev2, the Jupyter+git problem has been totally solved. It provides a set of hooks which provide clean git diffs, solve most git conflicts automatically, and ensure that any remaining conflicts can be resolved entirely within the standard Jupyter notebook environment. To get started, follow the directions on Git-friendly Jupyter. Contents The Jupyter+git problem The solution The nbdev2 git merge driver The nbdev2 Jupyter save hook Background The result Postscript: other Jupyter+git tools ReviewNB An alternative solution: Jupytext nbdime The Jupyter+git problem Jupyter notebooks are a powerful tool for scientists, engineers, technical writers, students, teachers, and more. They provide an ideal notebook environment for interact…  ( 7 min )
2023-10-06T00:43:14.575Z osmosfeed 1.15.1